Two Voice Devs
By Mark and Allen
Two Voice DevsFeb 11, 2021
Episode 192 - Google Cloud Next 2024 Recap
Join Allen Firstenberg and guest host Stefania Pecore on Two Voice Devs as they delve into the exciting announcements and highlights from Google Cloud Next 2024! This episode focuses on the latest advancements in AI and their impact on the healthcare industry, providing valuable insights for developers and tech enthusiasts.
Learn more:
* https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2024-wrap-up
Timestamps:
00:00:00: Introduction
00:01:02: Stefania's background and journey into AI
00:07:20: Stefania's overall experience at Google Cloud Next
00:11:59: Focus on Healthcare and AI applications, including Mayo Clinic's Solution Studio
00:15:38: Exploring the new Gemini product suite and its features like code assistance and data analysis
00:20:44: Discussing Gemini API updates, including the 1.5 public preview with 1M token context window and grounding tools
00:26:06: Vertex AI Agent Builder and its no-code approach to chatbot developmen
t
00:33:02: Hardware announcements, including the A3 VM with NVIDIA H100 GPUs
00:35:24: Stefania's reflections on Cloud Next and the value of attending
Tune in to discover the future of AI and its transformative potential, especially in the healthcare sector. Share your thoughts on the Google Cloud Next announcements in the comments below!
Episode 191 - Beyond the Hype: Exploring BERT
This episode of Two Voice Devs takes a closer look at BERT, a powerful language model with applications beyond the typical hype surrounding large language models (LLMs). We delve into the specifics of BERT, its strengths in understanding and classifying text, and how developers can utilize it for tasks like sentiment analysis, entity recognition, and more.
Timestamps:
0:00:00: Introduction
0:01:04: What is BERT and how does it differ from LLMs?
0:02:16: Exploring Hugging Face and the BERT base uncased model.
0:04:17: BERT's pre-training process and tasks: Masked Language Modeling and Next Sentence Prediction.
0:11:11: Understanding the concept of masked language modeling and next sentence prediction.
0:19:45: Diving into the original BERT research paper.
0:27:55: Fine-tuning BERT for specific tasks: Sentiment Analysis example.
0:32:11: Building upon BERT: Exploring the Roberta model and its applications.
0:39:27: Discussion on BERT's limitations and its role in the NLP landscape.
Join us as we explore the practical side of BERT and discover how this model can be a valuable tool for developers working with text-based data. We'll discuss i
ts capabilities, limitations, and potential use cases to provide a comprehensive understanding of this foundational NLP model.
Episode 190 - Google Gemma's Tortoise and Hare Adventure
Embark on a wild race with Gemma as we explore the exciting (and sometimes slow) world of running Google's open-source large language model! We'll test drive different methods, from the leisurely pace of Ollama on a local machine to the speedier Groq platform. Join us as we compare these approaches, analyzing performance, costs, and ease of use for developers working with LLMs. Will the tortoise or the hare win this race?
Learn more:
* Model card: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335
* Ollama: https://ollama.com/
* LangChain.js with Ollama: https://js.langchain.com/docs/integrations/llms/ollama
* Groq: https://groq.com/
Timestamps:
0:00:00 - Introduction
0:03:05 - Getting to Know Gemma: Exploring the Model Card
0:05:30 - Vertex AI Endpoint: Fast Deployment, But at What Cost?
0:13:40 - Ollama: The Tortoise of Local LLM Hosting
0:17:40 - LangChain Integration: Adding Functionality to Ollama
0:21:44 - Groq: The Hare of LLM Hardware
0:26:06 - Comparing Approaches: Speed vs. Cost vs. Control
0:27:35 - Future of Open LLMs and Google Cloud Next
#GemmaSprint
This project was supported, in part, by Cloud Credits from Google
Episode 189 - Farewell, ADR: The Impact on Alexa Developers
The Alexa Developer Rewards Program (ADR) is shutting down, leaving many developers wondering about the future of Alexa skills. Mark and Allen discuss the implications of this change, explore alternative monetization options, and share their thoughts on the future of skill development.
Timestamps:
0:00 - Intro and announcement of the ADR program ending
1:45 - History of the ADR program and its impact on skill development
7:13 - Discussion of the Skill Developer Accelerator Program (SDAP) and Skill Coach
14:04 - Status of AWS credits for skill developers
15:10 - Incentives for building skills in the absence of the ADR program
21:30 - Cost-benefit analysis and the future of skill development
25:48 - Call to action: Share your thoughts on the ADR program ending and the future of skills
Join the conversation and let us know what you think!E
Episode 188 - Building Responsible AI with Gemini
As large language models (LLMs) become increasingly powerful, ensuring their responsible use is crucial. In this episode of Two Voice Devs, Allen and Mark delve into Google's Gemini LLM, specifically its built-in safety features designed to prevent harmful outputs like harassment, hate speech, sexually explicit content, and dangerous information.
Join them as they discuss:
(00:01:55) The importance of safety features in LLMs and Google's approach to responsible AI.
(00:03:08) A walkthrough of Gemini's safety settings in AI Studio, including the four categories of evaluation and developer control options.
(00:06:51) Examples of how Gemini flags potentially harmful prompts and responses, and how developers can adjust settings to control output.
(00:08:55) A deep dive into the API, exploring the parameters and responses related to safety features.
(00:19:38) The challenges of handling incomplete responses due to safety violations and the need for better recovery strategies.
(00:26:47) The importance of industry standards and finer-grained control for responsible AI development.
(00:29:00) A call to action for developers and conversation designers to discuss and collaborate on best practices for handling safety issues in LLMs.
This episode offers valuable insights for developers working with LLMs and anyone interested in the future of responsible AI. Tune in and share your thoughts on how we can build safer and more ethical AI systems!
Episode 187 - LLMs in Developer Tools
In this episode of Two Voice Devs, Mark and Allen discuss how developers can leverage AI tools like ChatGPT to improve their workflow. Mark shares his experience using ChatGPT to generate an OpenAPI specification from TypeScript types, saving him significant time and effort. They discuss the benefits and limitations of using AI for code generation, emphasizing the importance of understanding the generated code and maintaining healthy skepticism.
Timestamps:
00:00:00 Introduction
00:00:49 Using AI as a developer tool
00:01:17 Generating OpenAPI specifications with ChatGPT
00:04:02 Mark's prompt and TypeScript types
00:05:37 Reviewing the generated OpenAPI specification
00:07:12 Adding request examples with ChatGPT
00:10:11 Benefits and limitations of AI code generation
00:13:43 Using AI tools for learning and understanding code
00:17:39 Trusting AI-generated code and potential for bias
00:19:04 Integrating AI tools into the development workflow
00:22:38 The future of AI in software development
00:23:17 Programmers as problem solvers, not just code writers
00:25:41 AI as a tool in the developer's toolbox
00:26:07 Call to action: Share your experiences with AI tools
This episode offers valuable insights for developers interested in exploring the potential of AI to enhance their productivity and efficiency.
Episode 186 - Conversational AI with Voiceflow Functions
Join us on Two Voice Devs as we chat with Xavi, Head of Cloud Infrastructure at Voiceflow, about the exciting new Voiceflow Functions feature and the future of conversational AI development. Xavi shares his journey into the world of bots and assistants, dives into the technology behind Voiceflow's infrastructure, and explains how functions empower developers to create custom, reusable components for their conversational experiences.
Timestamps:
- 00:00:00 Introduction
- 00:00:49 Xavi's journey into conversational AI
- 00:06:08 Voiceflow's infrastructure and technology
- 00:09:29 Voiceflow's evolution and direction
- 00:13:28 Introducing Voiceflow Functions
- 00:16:05 Capabilities and limitations of functions
- 00:20:35 Future of Voiceflow Functions
- 00:21:02 Sharing and contributing functions
- 00:24:02 Technical limitations of functions
- 00:25:35 Closing remarks and call to action
Whether you're a seasoned developer or just getting started with conversational AI, this episode offers valuable insights into the evolving landscape of bot development and the powerful capabilities of Voiceflow.
Episode 185 - Cloud vs Local LLMs: A Developer's Dilemma
In this episode of Two Voice Devs, Allen Firstenberg and Roger Kibbe explore the rising trend of local LLMs, smaller language models designed to run on personal devices instead of relying on cloud-based APIs. They discuss the advantages and disadvantages of this approach, focusing on data privacy, control, cost efficiency, and the unique opportunities it presents for developers. They also delve into the importance of fine-tuning these smaller models for specific tasks, enabling them to excel in areas like legal contract analysis and mobile app development.
The conversation dives into various popular local LLM models, including:
- Mistral: Roger's favorite, lauded for its capabilities and ability to run efficiently on smaller machines.
- Phi-2: A tiny model from Microsoft ideal for on-device applications.
- Llama: Meta's influential model, with Llama 2 currently leading the pack and Llama 3 anticipated to be comparable to ChatGPT 4.
- Gemma: Google's new open-source model with potential, but still under evaluation.
Learn more:
- Ollama: https://ollama.com/
- Ollama source: https://github.com/ollama/ollama
- LM Studio: https://lmstudio.ai/
Timestamps:
00:00:00: Introduction and welcome back to Roger Kibbe.
00:01:31: Roger discusses his career path and his passion for voice and AI.
00:06:33: The discussion turns to the larger vs. smaller LLMs.
00:13:52: Understanding key terminology like quantization and fine-tuning.
00:20:58: Roger shares his favorite local LLM models.
00:25:14: Discussing the strengths and weaknesses of smaller models like Gemma.
00:30:32: Exploring the benefits and challenges of running LLMs locally.
00:39:15: The value of local LLMs for developers and individual learning.
00:40:29: The impact of local LLMs on mobile devices and app development.
00:49:27: Closing thoughts and call for audience feedback.
Join Allen and Roger as they explore the exciting potential of local LLMs and how they might revolutionize the development landscape!
Episode 184 - Large Action Models: The Future of Conversational AI?
Join Allen and Mark on Two Voice Devs as they dive into the world of Large Action Models (LAMs) and explore their potential to revolutionize how we build chatbots and voice assistants.
Inspired by Braden Ream's article "How Large Action Models Work and Change the Way We Build Chatbots and Agents," the discussion dissects the core functions of conversational AI - understand, decide, and respond - and examines how LAMs might fit into this framework.
Allen and Mark also compare and contrast LAMs with Large Language Models (LLMs) and Natural Language Understanding (NLU), highlighting the strengths and limitations of each approach.
Tune in to hear their insights on:
- The evolution of Voiceflow and its shift towards LLMs (03:20)
- Understanding the core functions of conversational AI (05:40)
- Clippy as an example of a deterministic agent (06:15)
- The differences between deterministic and probabilistic models (07:50)
- NLU vs. LLMs for understanding user input (09:20)
- How LAMs might fit into the "decide" stage of conversational AI (18:50)
- The challenges of training LAMs and avoiding hallucinations (20:00)
- The potential of LAMs to improve response generation (29:30)
- Cost considerations of using LLMs vs. NLUs (37:00)
Whether you're a seasoned developer or just curious about the future of conversational AI, this episode offers a thought-provoking discussion on the potential of LAMs and the challenges that lie ahead.
Be sure to share your thoughts in the comments below!
Additional Info:
- https://www.voiceflow.com/blog/large-action-models-change-the-way-we-build-chatbots-again
Episode 183 - Gemini 1.5: One Million Tokens, Endless Possibilities? 🤯
Google's Gemini 1.5 is here, boasting a mind-blowing 1 million token context window! 🤯 Join Allen and Linda as they dive deep into this experimental AI, exploring its capabilities, limitations, and potential use cases. 🤔
They share their experiences testing Gemini 1.5 with original content, including Two Voice Devs transcripts and synthetic videos, and discuss the challenges of finding data that hasn't already been used to train the AI. 🧐
Get ready for a lively discussion on hallucinations, the future of content creation, and the ethical questions surrounding these powerful language models. 🤖
More info:
* https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/
* https://developers.googleblog.com/2024/02/gemini-15-available-for-private-preview-in-google-ai-studio.html
* https://openai.com/sora
Timestamps:
00:00:00 Introduction
00:01:05 Notable features of Gemini 1.5
00:02:57 What is a token?
00:06:39 Linda's test with Danish citizenship PDF
00:09:33 Allen's test with Les Miserables and needle in a haystack
00:12:27 Testing with Data Portability API data
00:14:28 Linda's test with YouTube search history and Netflix recommendations
00:17:44 Allen's test with Two Voice Devs transcripts
00:21:32 Issues with counting and hallucinations
00:24:21 Testing with OpenAI's Sora AI synthetic videos
00:30:05 Ethical questions and the future of content creation
00:31:50 Potential use cases for large context windows
00:36:34 API limitations and challenges
00:37:39 Performance and cost considerations
00:41:34 Comparison with retrieval augmented generation and vector databases
00:44:21 Generating summaries and markers from this transcript
Leave your thoughts and questions in the comments below!
Episode 182 - Bard Becomes Gemini: Why Devs Care
In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss Gemini, Google's latest name for its Generative AI... stuff. Originally known as separate products including Bard and Duet AI, Gemini encompasses a suite of AI tools, including chatbots, product-specific assistants, models, and APIs that developers can use for various tasks. The discussion covers how Gemini compares with offerings from other companies such as OpenAI and Microsoft, including visible similarities and differences. The show concludes by answering the question about why developers should care about this rename with a call to explore possibilities with AI tools like Gemini to let us create more natural and user-friendly interfaces.
Learn more:
- https://blog.google/technology/ai/google-gemini-update-sundar-pichai-2024/
- https://blog.google/products/gemini/bard-gemini-advanced-app/
00:04 Introduction and Catching Up
00:55 Exploring the Gemini Model
04:09 Gemini vs OpenAI: A Comparison
10:20 Understanding the Gemini Branding
12:00 The Developer's Perspective on Gemini
17:46 Closing Thoughts and Future Discussions
Episode 181 - Let Your Web Pages Talk With CSS
In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss the CSS Speech Module Level 1 Candidate Recommendation Draft, a standard that enables webpages to talk, developed in collaboration with the voice browser activity. They explore its features including the 'aural' box model concept, voice families, earcons and more, drawing parallels with SSML and highlight its innovative approach to web accessibility complementing screen readers. Despite acknowledging its potential, they address some of its key omissions such as phonemes and the lack of a background audio feature.
00:04 Introduction and Welcome
01:14 Exploring the Concept of Webpages Talking
03:00 Deep Dive into CSS Speech Module
03:48 Understanding the Scope of CSS Speech Module
04:27 The Evolution of Voice Interaction
05:22 Comparing CSS Speech with SSML
07:13 The Power of CSS in Voice Development
22:49 The Impact of Voice Balance Property
29:20 The Limitations of CSS Speech
39:37 The Future of CSS Speech
42:50 Conclusion and Final Thoughts
Episode 180 - Run Rabbit One
Forget Apps! Talking to this Orange Cube Could Change Everything
Is the app model broken? The creators of Rabbit R1, a new voice-first device, certainly think so. In this episode of Two Voice Devs, Mark and Allen break down this innovative device and its potential to change how we interact with technology. What do developers think about the technology underlying RabbitOS? You may be surprised!
Key topics:
- 00:02:00 - What is the Rabbit R1? Rabbit R1 is a new type of device that prioritizes voice input and output. It aims to shift users away from apps and toward a more conversational way of interacting with technology.
- 00:05:17 - AI models: Rabbit uses a unique "large action model" to understand and complete tasks. It claims to do this faster and more intuitively than existing voice assistants.
- 00:14:14 - Teach Me mode: See how Rabbit can be trained to interact with new websites and applications. What implications does this have for the future?
- 00:18:41 - Can it replace apps? While that's a bold claim, Rabbit's conversational approach and innovative features show promise. Could this be the first step towards a new era in human-computer interaction?
Additional thoughts:
- 00:25:06 - Hybrid approach: Rabbit smartly combines intent-based and language-based AI models, potentially offering speed and accuracy.
- 00:32:56 - Asynchronous interactions: It breaks away from the traditional request-response model, offering a more natural conversational experience that aligns with the Star Trek computer vision.
- 00:07:48 - Price: At just $199, many people are willing to check it out, and this could accelerate interest in voice-driven interfaces.
Is Rabbit R1 a game-changer or just a gimmick? Let us know your thoughts in the comments!
Episode 179 - What's New With APL 2023.3
In this episode of 'Two Voice Devs', hosts Allen Firstenberg and Mark Tucker discuss updates made to Alexa Presentation Language (APL) version 2023.3. They highlight conditional imports, updates made for animations, and more, including APL support for different devices and how to "handle" backward compatibility.
Learn More:
https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apl-latest-version.html
00:08 Introduction and Welcome
00:17 Alexa Presentation Language (APL) Overview
01:02 Understanding APL and its Components
03:23 Exploring APL's Functionality and Usage
05:22 APL's Versioning Strategy and Device Compatibility
09:23 New Features in APL 2023.3: Conditional Imports
15:22 New Features in APL 2023.3: Item Insertion and Removal Commands
18:05 New Features in APL 2023.3: Control Over Scrolling and Paging
19:43 New Features in APL 2023.3: Accessibility Improvements
20:36 New Features in APL 2023.3: Frame Component Deprecation
22:23 New Features in APL 2023.3: Data Property for Sequential and Parallel Commands
25:07 New Features in APL 2023.3: Support for Variable Sized Viewports
26:47 New Features in APL 2023.3: Support for Lottie Files
28:33 New Features in APL 2023.3: String Functions and Vector Graphic Improvements
30:11 New Features in APL 2023.3: Extensions and APL Cheat Sheets
37:26 Strategies for Backwards Compatibility in APL
38:40 Conclusion and Farewell
Episode 178 - Looking Forward to 2024
In their New Year's discussion, Mark and Allen explore their hopes and predictions for technological advancements in 2024. They discuss the future of Large Language Models (and if that's the right name for them now), expressing anticipation for improvements in latency issues and the potential for models to be hosted on devices rather than cloud-based platforms. The conversation also ventures into the world of AI agents, function calling, and the importance of developers in ensuring safety measures are integrated in AI systems. Finally, they exude excitement about the possibility of AI in multimedia formats, where tools can generate differing output forms like text, video, images, and possibly even audio directly. They explore potential developer opportunities and challenges, emphasizing the importance of understanding regulations and ensuring user privacy and safety.
00:04 Introduction and New Year Reflections
02:05 Looking Forward: Predictions for 2024
02:14 The Future of Large Language Models (LLMs)
03:08 The Impact of LLMs on Voice Assistants
07:44 The Potential of On-Device AI Models
10:14 The Role of Developers in the AI Landscape
20:11 The Future of Multimodal AI Models
26:35 The Importance of Regulations in AI
29:22 Conclusion: Exciting Times Ahead
Episode 177 - Looking Back at 2023
Allen Firstenberg and Mark Tucker, hosts of Two Voice Devs, reflect on the year 2023, discussing significant changes and trends in the #VoiceFirst and #GenerativeAI industry and where their predictions from last year were accurate... or fell short. They discuss the transformation and challenges Amazon faced, gleaning predictions from hints at large language models (LLMs) from Google, Amazon, Microsoft, and Apple. They also mention the shift of Voiceflow towards LLMs and recall the notion of retrieval augmented generation.
00:04 Introduction and Welcome
00:12 Reflecting on the Past Year
01:13 Amazon's Progress and Challenges
01:59 Exploring Amazon's Monetization and Widgets
08:45 Google's Journey and the End of Conversational Actions
11:53 The Rise of Large Language Models (LLMs)
17:04 The Impact of Voiceflow and Dialogflow
20:48 Closing Remarks and New Year Wishes
Episode 176 - The Night Before Tech-mas
Mark and Allen get into the Tech-mas spirit, with a little help from Bard.
Hoping you all have the happiest of holiday seasons.
#GenerativeAI #VoiceFirst #ConversationalAI #HappyHolidays
Episode 175 - Gemini: A First Look
In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future.
00:04 Introduction and Welcome
00:23 Discussing the New Gemini Model
01:33 Comparing Gemini and Bison Models
02:07 Exploring Gemini's Vision Model
03:03 Gemini's Response Quality and Speed
03:53 Gemini's Token Length and Context Window
05:05 Gemini's Pricing and Google AI Studio
05:33 Upcoming Projects and Previews
06:16 Gemini's Role in Code Generation
07:54 Gemini's Model Variants and Limitations
12:01 Creating a Python Desktop App with Gemini
14:07 Gemini's Potential for Assisting the Visually Impaired
18:35 Gemini's Ability to Reason and Count
20:15 Gemini's Multi-Step Reasoning
20:33 Testing Gemini with Multiple Images
21:52 Exploring Image Recognition Capabilities
22:13 Discussing the Limitations of 3D Object Recognition
23:53 Testing Image Recognition with Personal Photos
24:52 Potential Applications of Image Recognition
25:45 Exploring the Multimodal Capabilities of the AI
26:41 Discussing the Challenges of Using the AI in Europe
27:26 Exploring the AQA Model and Its Potential
33:37 Discussing the Future of AI and Image Recognition
37:12 Wishlist for Future AI Capabilities
40:11 Wrapping Up and Looking Forward
Episode 174 - Live and In Person at Voice+AI 2023
Join Allen Firstenberg and guest host Noble Ackerson, at the Voice and AI 2023 conference. They discuss the growth of AI and how LLM (large language models) are affecting the tech world and delve deep into topics like LangChain, generative AI, and how to optimize AI operations to tackle network latency. There are also plenty of audience questions, exploring the current challenges in AI and potential solutions.
00:03 Introduction and Background of Two Voice Devs
00:31 The Evolution of Voice Technology and AI
01:50 Interactive Q&A Session Begins
01:58 Discussion on Open Source Software and Generative AI
02:59 Deep Dive into LangChain
05:43 Audience Participation and Questions
06:00 Challenges with LangChain and Overhead
08:14 Exploring the Intersection of Voice Technology and Generative AI
12:51 Addressing Network Latency in Voice Technology
19:49 The Future of AI and Voice Technology
26:53 Addressing the Challenges of Network Latency
37:13 Closing Remarks and Future Engagements
Episode 173 - Thanksgiving Thoughts 2023
Join Mark Tucker and Allen Firstenberg on Thanksgiving Day for a sincere heart-to-heart on the highs and lows of their tech industry journey. Expressing their gratitude for their family, friends, and colleagues in the tech industry and beyond, they acknowledge the challenging times faced by many. They call on their viewers to remember how unique and important they are and invite them to express their thoughts and emotions openly by reaching out to them.
00:04 Introduction and Thanksgiving Greetings
00:28 Reflecting on the Past Year
02:19 Gratitude for Personal Relationships
03:54 Acknowledging Industry Challenges and Layoffs
05:59 Importance of Community and Support
07:59 Encouragement and Closing Remarks
Episode 172 - VoiceFlow Changes and Solutions
Mark Tucker and Allen Firstenberg delve into the recent changes made by VoiceFlow. We explore how VoiceFlow, originally a design resource for Alexa Skills and Google Assistant Actions, has evolved and shifted to include chatbot roles and generative AI responses. Highlighted too are the implications of VoiceFlow's decoupling and transition to 'bot logic as a service'. We look at the necessary technical adjustments and solutions required in the aftermath of these changes, and Mark shares how he created a Jovo plugin as a hassle-free 'integration layer' for handling multiple platforms, taking advantage of Jovo's generic input output.
More info:
- https://github.com/jovo-community/jovo4-voiceflowdialog-app
00:04 Introduction
00:54 Introducing VoiceFlow
01:44 Exploring VoiceFlow's Evolution
03:13 Understanding VoiceFlow's Changes
05:39 Explaining the VoiceFlow Integration
14:39 Discussing the VoiceFlow Dialog API
25:42 Conclusion
Episode 171 - Ups and Downs of the OpenAI DevDay Roller Coaster
On this episode, Mark Tucker and Allen Firstenberg dive deep into the latest announcements by OpenAI. They discuss various developments including the launch of GPTs (collections of prompts and documents with configuration settings), the new text-to-speech model, upcoming GPT-4 Turbo, reproducible outputs, and the introduction of the Assistant API. While they express excitement for what these developments could mean for #VoiceFirst, #ConversationAI, and #GenerativeAI, they also voice concerns about discovery solutions, monetization, and the reliance on platform-based infrastructure. Tune in and join the conversation.
More info:
- https://openai.com/blog/new-models-and-developer-products-announced-at-devday
00:04 Introduction and OpenAI Announcements Edition
00:52 Discussion on OpenAI's New Text to Speech Model
02:15 Exploring the Pricing and Quality of OpenAI's Text to Speech Model
02:52 Concerns and Limitations of OpenAI's Text to Speech Model
06:24 Introduction to GPT 4 Turbo
06:48 Benefits and Limitations of GPT 4 Turbo
09:27 Exploring the Features of GPT 4 Turbo
18:52 Introduction to GPTs and Their Potential
22:22 Concerns and Questions About GPTs
32:14 Discussion on the Assistant API
37:32 Final Thoughts and Wrap Up
Episode 170 - At the Hub of MakerSuite and LangChain
Allen and Mark discuss the practical uses and advantages offered by MakerSuite, an API currently available for Google's PaLM #GenerativeAI model. We look at its unique feature that treats prompts like templates, allowing for versatile manipulation of these templates for varying results. We further delve into how it saves these prompts in Google Drive and how this can be linked to LangChain's new hub concept, leading to an effective 'MakerSuite hub.' Finally, we explore if prompts are more like code or content, and how that fits into the development process. What do you think?
More info:
- MakerSuite: https://makersuite.google.com/
- MakerSuite Hub in LangChain JS: https://js.langchain.com/docs/ecosystem/integrations/makersuite
Episode 169 - First Thoughts on TypeChat
Mark and Allen explore TypeChat - a new library from Microsoft that makes prompt engineering for function-like operations in #ConversationalAI easier and more robust. Is this a replacement for Intents? Does it go beyond what we could do with Intent-based systems? Is it lacking something? Let's explore!
Learn more:
Episode 168 - Defining Retrieval Augmented Generation
What started as a casual conversation between Mark and Allen turned into a brief exploration of what Retrieval Augmented Generation (RAG) means in the #GenerativeAI and #ConversationalAI world. Toss in some discussion about VoiceFlow and Google's Vertex AI Search and Conversation and we have another dive into the current hot method to bridge the Fuzzy Human / Digital Computer divide.
Episode 167 - What Does Bard Have to Say to Devs?
Last week, before Google's annual hardware event, Allen teased part of his prediction about Google Assistant and Bard. This week, we'll show the full clip of Allen's prediction and see just how close he was. Then Mark and Allen discuss how recent announcements from OpenAI, Amazon Alexa, and Google compare to each other and, more important, what they each mean for developers in a #GenerativeAI, #ConversationalAI, and perhaps even a #VoiceFirst world, and perhaps make a few more predictions and what we'll hear next.
More info:
- Blog post about Assistant With Bard: https://blog.google/products/assistant/google-assistant-bard-generative-ai/
- Announcement at the the Made By Google event: https://www.youtube.com/live/pxlaUCJZ27E?si=I1noN-l3LQHgBktp&t=2941
Episode 166 - What's Next at Google Cloud Next 2023
The Google Cloud Next conference is a massive display of the latest technologies and products available from Google Cloud - from AI to Zero-Trust solutions. Unsurprisingly, #MachineLearning was prominent in this years show, so Mark and Allen take a look at some of the biggest #GenerativeAI and #ConversationalAI announcements this year.
More info:
- https://cloud.google.com/blog/topics/google-cloud-next/next-2023-wrap-up
Episode 165 - Speaking of LLMs and Alexa...
Mark shares the exciting news that Amazon Alexa will soon have a #VocieFirst #ConversationalAI LLM chat mode! While Allen agrees that this is very exciting news, he still has quite a few questions about how #GenerativeAI technology will fit into Alexa skills. We ask the difficult questions and see what answers are currently out there.
What do you think about this announcement from Alexa?
More info:
- LLM feature description: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2023/09/alexa-llm-fall-devices-services-sep-2023
- Event video: https://youtu.be/_JcP7N0QPOk
Episode 164 - VOICE + AI 2023 Recap
Noble and Allen take a look back at our experiences at this years VOICE + AI conference. What were the big topics being discussed? The amusing moments? And what do we want to see next year?
#GenerativeAI #ConversationalAI #VoiceFirst
Episode 163 - Using Google's MakerSuite PaLM API for Analytics
Allen and guest host Linda have a wide ranging conversation, from Linda's career path and her experiences as a Google Developer Expert for Google Analytics, to how she leveraged that knowledge while trying out something new with Google's #GenerativeAI tool, MakerSuite and the PaLM API. We take a close look at how developers can use prompts (more than one!) to help turn a user's request into actionable data structures that feed into an API and get results.
More from Linda:
- https://LindaLawton.DK
- https://daimto.com
#MakerSuiteSprint #LargeLanguageModel
Episode 162 - Previewing Voice+AI 2023
We're just days away from the annual VOICE+AI conference, hosted this year in Washington, DC. Both Allen and Noble will be speaking (and hosting a live and in person recording of a future episode!), so we'll give a little preview of what you can hear if you're attending.
Episode 161 - LangChain JS + Matching Engine = ?
Allen and Mark revisit a conversation from episode 146 where they discovered Google had a Vector Database. Now, several months later, Allen has done some work with the Google Cloud Vertex AI Matching Engine and incorporated it into LangChain JS. We discuss why this is important, and how it fits into the overall landscape of LLMs and MLs today. (And Allen has a little announcement towards the end.)
More info:
* Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine/overview
* LangChain JS: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/googlevertexai
Episode 160 - So You Downloaded an LLM. Now what?
This seems like an easy question, right? If you want to do #ConversationalAI or #GenerativeAI on your own machine with a model such as Llama 2, you can just download the model and... well... then what? This is the question posed to guest host Noble Ackerson - and the answer was both more complicated and simpler than Allen could imagine!
Episode 159 - What's New With APL 2023.2?
Amazon has made some changes to the Alexa Presentation Language, dubbing this version 2023.2, and Allen is a bit confused about what these updates bring. Mark, however, clarifies what's new, how it relates to what was previously available, and why some users can benefit from this latest APL release.
Episode 158 - Picture an Embedding, If You Will
One of the neat features we've seen come out of the #GenerativeAI and #ConversationalAI explosion recently has been the attention being paid to text embeddings and how they can be used to radically change how we index and search for things. Allen, however, has recently been working with an image embedding model from Google, including incorporating it into LangChain JS. Mark asks about what that process was like, what this new model lets us do, and starts to explore some of the potential of this new tool that is available for everyone.
References:
- LangChain JS module: https://js.langchain.com/docs/modules/data_connection/experimental/multimodal_embeddings/google_vertex_ai
- Information from Google: https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-image-embeddings
- Google Model Garden info: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/5
- XKCD: https://xkcd.com/1425/
Episode 157 - Three Years... and Still Going!
Three years of Two Voice Devs! There's no doubt that the #VoiceFirst industry has changed over that time, with the rise of #GenerativeAI and #ConversationalAI taking the world by storm. Mark and Allen look back at how the show has evolved over this time, and why we hope you'll be joining us as we continue forward on our journey!
Episode 156 - Go with the Dialogflow CX Flow
Guest Host Xavier Portilla returns to chat with Allen about some of the latest additions to Dialogflow CX. New system functions make some of the processing you can do on inputs easier and faster, while prebuilt flows and flow scoped parameters make it easier to have clearly defined, and reusable, components in your conversation design.
More info:
- https://cloud.google.com/dialogflow/docs/release-notes#July_05_2023
Episode 155 - New Alexa Slot Type is Wild!
Guest host Xavier Portilla joins Allen to take a look at a new slot type that the Alexa team has in public beta. How can this new type be used? How does it differ from previous slot types? And what is a slot type anyway?
Episode 154 - The Philosophical Developer
Guest Host Leslie Pound joins Allen to discuss her perspective on software development and #GenerativeAI and how, rather than trying to translate our fuzzy side, developers should think about how it helps us be more aware of how users are seeking to be more inspired or creative.
Episode 153 - Between Fuzzy and Discrete With LLMs
Noble Ackerson returns to discuss about a recent presentation that Allen made to the Google Developer Group NYC chapter where he illustrates how #GenerativeAI can be used as a bridge between the discrete nature of computers and the "fuzzy" nature of humans. He and Noble discuss how Large Language Models, such as OpenAI and Google's PaLM 2, along with libraries like LangChain become a powerful tool in every developer's toolbox.
Episdoe 152 - What's the Intent of OpenAI Functions?
Allen is joined by Noble Ackerson to discuss the latest feature that OpenAI has included with it's GPT models. Functions provide a well defined way for developers to turn unstructured human input to a more structured format that can be processed by your code or using a library such as LangChain. We take a look at both how they can be used, but some of the open questions that remain about their use.
More info:
- https://platform.openai.com/docs/guides/gpt/function-calling
Episode 151 - Requiem for Conversational Actions
This week, Google completed the "sunset" of Conversational Actions for the Google Assistant. Mark and Allen discuss the ups and downs of Actions on Google, how it fit into the #VoiceFirst landscape, and what may come next.
Episode 150 - Another Look Backwards and Forwards
Another milestone episode! Mark and Allen take advantage of the event to look back at our predictions from episode 100, look back at how #VoiceFirst development has changed over the past 50 episodes (and several years), and look forward to what we'll be talking about in the next 50 episodes.
Episode 149 - Recent Projects: Cards and Chains
It's been a busy week! What have we been up to? Mark has released a new set of cards that summarize and illustrate different AI concepts. Called "AI Explorer Cards of Discovery", we chat about the objectives and the process to create this deck. (And there's a special offer for listeners!) Meanwhile, Allen has been working with Google's new PaLM model as part of Google Cloud's Vertex AI platform and has contributed changes to the popular LangChainJS package to make PaLM available through the open source library.
Resources:
* AI Explorer Cards of Discovery: https://bit.ly/ai-cards
* LangChainJS: https://github.com/hwchase17/langchainjs
* Google PaLM: https://cloud.google.com/ai/generative-ai
Episode 148 - AI Voodoo With Vodo Drive
SO MUCH packed into this episode!
Recently, Allen participated in a hackathon sponsored by VoiceFlow, and he used the opportunity to explore ways that LLMs could be used to build on his work talking with spreadsheets in Vodo Drive (see episode 116). He and Mark explore how he did it - from the prompts that were required to integration with VoiceFlow and Google App Script, to how tools like LangChain will help build similar things. We also explore what lessons are learned, how our experience in #VoiceFirst design helps us build good #ConversationalAI tools, how other APIs can (and should!) work alongside AI, and what "fuzzy" roles AI can fill in the modern app experience.
Resources:
* Vodo Drive: https://vodo-drive.com/
* PromptHacks Hackathon: https://prompthacks.devpost.com/
* Vodo AI submission for PromptHacks: https://devpost.com/software/vodo-ai
* VoiceFlow: https://www.voiceflow.com/
* Google Apps Script: https://www.google.com/script/start/
* LangChain: https://github.com/hwchase17/langchain and https://github.com/hwchase17/langchainjs
Episode 147 - Google AI/O 2023 Recap
It's Google I/O time again! And although Allen couldn't attend in person, he and Mark review the latest announcements relevant to #VoiceFirst and #ConversationalAI developers. From new AI availability to AI workspace, with stops along the way to discuss AI powered hardware, there was lots to hear about. Also some subtle hints from what wasn't said. But did we mention the AI?
Learn more:
* https://blog.google/technology/developers/google-io-2023-100-announcements/
Episode 146 - Visions of Vector Databases
We've touched on the use of vector databases as we've started to explore how LLMs and conversational AIs can be useful, but what are they and how do they work? How are they used for more than just LLMs? Mark and Allen explore some of the classic vector DBs, such as HNSW, and some of the newer fully managed ones, including Metal and Pinecone. We even start to ponder what a fully managed embedding and vector db system might look like from the likes of Google, Azure, or AWS, and are surprised that we're closer than we thought!
Resources:
* HNSWlib: https://github.com/nmslib/hnswlib
* Pinecone: https://pinecone.io/
* Metal: https://getmetal.io/
* Google Cloud Vertex AI Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine/overview
* Amazon AWS Bedrock: https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/
Episode 145 - Alexa Widgets are Here!
Long teased, the ability for developers to create Alexa Widgets is finally generally available! Mark, an Alexa Champion, has had access for a while now, so he and Allen discuss what it takes to make a Widget, what's new and different, and how it fits into the #VoiceFirst world of skills.
Episode 144 - Experiments With LangChain (Part 2)
We're still exploring what LangChain can do, and this week we dive into a tutorial put out by the Voiceflow team that discusses some ways that it can be integrated with ChatGPT using LangChain, bringing the #VoiceFirst and #ConversationalAI worlds closer together. Also a great example of how we go about learning and understanding code that is new to us.
Resources:
* The tutorial we were following: https://www.voiceflow.com/blog/voiceflow-assistant-openai-gpt
Episode 143 - Experiments With LangChain (Part 1)
Over the past few weeks, Mark and Allen have been playing with LangChain and OpenAI, exploring where #ConversationalAI and #VoiceFirst design intersects, and we recorded some of our experiments. In this early one, we take a look at how LangChain with a memory chain can work and keep track of what's going on in the conversation. All in just a few lines of code. More significantly, we discuss the role that LangChain can play in putting together AI and other API components to create voice, web, and app-based agents that include AI as part of the NLU or response elements.