Two Voice Devs

By Mark and Allen

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

Listen on Spotify Send voice message

Available on

Report content on Spotify

Episode 29 - Analytics Buffet with a side of Performance

Two Voice DevsFeb 11, 2021

00:00

37:58

Episode 192 - Google Cloud Next 2024 Recap

Join Allen Firstenberg and guest host Stefania Pecore on Two Voice Devs as they delve into the exciting announcements and highlights from Google Cloud Next 2024! This episode focuses on the latest advancements in AI and their impact on the healthcare industry, providing valuable insights for developers and tech enthusiasts.

Learn more:

* https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2024-wrap-up

Timestamps:

00:00:00: Introduction

00:01:02: Stefania's background and journey into AI

00:07:20: Stefania's overall experience at Google Cloud Next

00:11:59: Focus on Healthcare and AI applications, including Mayo Clinic's Solution Studio

00:15:38: Exploring the new Gemini product suite and its features like code assistance and data analysis

00:20:44: Discussing Gemini API updates, including the 1.5 public preview with 1M token context window and grounding tools

00:26:06: Vertex AI Agent Builder and its no-code approach to chatbot developmen

00:33:02: Hardware announcements, including the A3 VM with NVIDIA H100 GPUs

00:35:24: Stefania's reflections on Cloud Next and the value of attending

Tune in to discover the future of AI and its transformative potential, especially in the healthcare sector. Share your thoughts on the Google Cloud Next announcements in the comments below!

Apr 26, 202440:36

Episode 191 - Beyond the Hype: Exploring BERT

This episode of Two Voice Devs takes a closer look at BERT, a powerful language model with applications beyond the typical hype surrounding large language models (LLMs). We delve into the specifics of BERT, its strengths in understanding and classifying text, and how developers can utilize it for tasks like sentiment analysis, entity recognition, and more.

Timestamps:

0:00:00: Introduction

0:01:04: What is BERT and how does it differ from LLMs?

0:02:16: Exploring Hugging Face and the BERT base uncased model.

0:04:17: BERT's pre-training process and tasks: Masked Language Modeling and Next Sentence Prediction.

0:11:11: Understanding the concept of masked language modeling and next sentence prediction.

0:19:45: Diving into the original BERT research paper.

0:27:55: Fine-tuning BERT for specific tasks: Sentiment Analysis example.

0:32:11: Building upon BERT: Exploring the Roberta model and its applications.

0:39:27: Discussion on BERT's limitations and its role in the NLP landscape.

Join us as we explore the practical side of BERT and discover how this model can be a valuable tool for developers working with text-based data. We'll discuss i

ts capabilities, limitations, and potential use cases to provide a comprehensive understanding of this foundational NLP model.

Apr 19, 202440:16

Episode 190 - Google Gemma's Tortoise and Hare Adventure

Embark on a wild race with Gemma as we explore the exciting (and sometimes slow) world of running Google's open-source large language model! We'll test drive different methods, from the leisurely pace of Ollama on a local machine to the speedier Groq platform. Join us as we compare these approaches, analyzing performance, costs, and ease of use for developers working with LLMs. Will the tortoise or the hare win this race?

Learn more:

* Model card: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335

* Ollama: https://ollama.com/

* LangChain.js with Ollama: https://js.langchain.com/docs/integrations/llms/ollama

* Groq: https://groq.com/

Timestamps:

0:00:00 - Introduction

0:03:05 - Getting to Know Gemma: Exploring the Model Card

0:05:30 - Vertex AI Endpoint: Fast Deployment, But at What Cost?

0:13:40 - Ollama: The Tortoise of Local LLM Hosting

0:17:40 - LangChain Integration: Adding Functionality to Ollama

0:21:44 - Groq: The Hare of LLM Hardware

0:26:06 - Comparing Approaches: Speed vs. Cost vs. Control

0:27:35 - Future of Open LLMs and Google Cloud Next

#GemmaSprint

This project was supported, in part, by Cloud Credits from Google

Apr 11, 202428:15

Episode 189 - Farewell, ADR: The Impact on Alexa Developers

The Alexa Developer Rewards Program (ADR) is shutting down, leaving many developers wondering about the future of Alexa skills. Mark and Allen discuss the implications of this change, explore alternative monetization options, and share their thoughts on the future of skill development.

Timestamps:

0:00 - Intro and announcement of the ADR program ending

1:45 - History of the ADR program and its impact on skill development

7:13 - Discussion of the Skill Developer Accelerator Program (SDAP) and Skill Coach

14:04 - Status of AWS credits for skill developers

15:10 - Incentives for building skills in the absence of the ADR program

21:30 - Cost-benefit analysis and the future of skill development

25:48 - Call to action: Share your thoughts on the ADR program ending and the future of skills

Join the conversation and let us know what you think!E

Apr 05, 202426:17

Episode 188 - Building Responsible AI with Gemini

As large language models (LLMs) become increasingly powerful, ensuring their responsible use is crucial. In this episode of Two Voice Devs, Allen and Mark delve into Google's Gemini LLM, specifically its built-in safety features designed to prevent harmful outputs like harassment, hate speech, sexually explicit content, and dangerous information.

Join them as they discuss:

(00:01:55) The importance of safety features in LLMs and Google's approach to responsible AI.

(00:03:08) A walkthrough of Gemini's safety settings in AI Studio, including the four categories of evaluation and developer control options.

(00:06:51) Examples of how Gemini flags potentially harmful prompts and responses, and how developers can adjust settings to control output.

(00:08:55) A deep dive into the API, exploring the parameters and responses related to safety features.

(00:19:38) The challenges of handling incomplete responses due to safety violations and the need for better recovery strategies.

(00:26:47) The importance of industry standards and finer-grained control for responsible AI development.

(00:29:00) A call to action for developers and conversation designers to discuss and collaborate on best practices for handling safety issues in LLMs.

This episode offers valuable insights for developers working with LLMs and anyone interested in the future of responsible AI. Tune in and share your thoughts on how we can build safer and more ethical AI systems!

Mar 29, 202430:06

Episode 187 - LLMs in Developer Tools

In this episode of Two Voice Devs, Mark and Allen discuss how developers can leverage AI tools like ChatGPT to improve their workflow. Mark shares his experience using ChatGPT to generate an OpenAPI specification from TypeScript types, saving him significant time and effort. They discuss the benefits and limitations of using AI for code generation, emphasizing the importance of understanding the generated code and maintaining healthy skepticism.

Timestamps:

00:00:00 Introduction

00:00:49 Using AI as a developer tool

00:01:17 Generating OpenAPI specifications with ChatGPT

00:04:02 Mark's prompt and TypeScript types

00:05:37 Reviewing the generated OpenAPI specification

00:07:12 Adding request examples with ChatGPT

00:10:11 Benefits and limitations of AI code generation

00:13:43 Using AI tools for learning and understanding code

00:17:39 Trusting AI-generated code and potential for bias

00:19:04 Integrating AI tools into the development workflow

00:22:38 The future of AI in software development

00:23:17 Programmers as problem solvers, not just code writers

00:25:41 AI as a tool in the developer's toolbox

00:26:07 Call to action: Share your experiences with AI tools

This episode offers valuable insights for developers interested in exploring the potential of AI to enhance their productivity and efficiency.

Mar 21, 202426:23

Episode 186 - Conversational AI with Voiceflow Functions

Join us on Two Voice Devs as we chat with Xavi, Head of Cloud Infrastructure at Voiceflow, about the exciting new Voiceflow Functions feature and the future of conversational AI development. Xavi shares his journey into the world of bots and assistants, dives into the technology behind Voiceflow's infrastructure, and explains how functions empower developers to create custom, reusable components for their conversational experiences.

Timestamps:

00:00:00 Introduction
00:00:49 Xavi's journey into conversational AI
00:06:08 Voiceflow's infrastructure and technology
00:09:29 Voiceflow's evolution and direction
00:13:28 Introducing Voiceflow Functions
00:16:05 Capabilities and limitations of functions
00:20:35 Future of Voiceflow Functions
00:21:02 Sharing and contributing functions
00:24:02 Technical limitations of functions
00:25:35 Closing remarks and call to action

Whether you're a seasoned developer or just getting started with conversational AI, this episode offers valuable insights into the evolving landscape of bot development and the powerful capabilities of Voiceflow.

Mar 14, 202426:56

Episode 185 - Cloud vs Local LLMs: A Developer's Dilemma

In this episode of Two Voice Devs, Allen Firstenberg and Roger Kibbe explore the rising trend of local LLMs, smaller language models designed to run on personal devices instead of relying on cloud-based APIs. They discuss the advantages and disadvantages of this approach, focusing on data privacy, control, cost efficiency, and the unique opportunities it presents for developers. They also delve into the importance of fine-tuning these smaller models for specific tasks, enabling them to excel in areas like legal contract analysis and mobile app development.

The conversation dives into various popular local LLM models, including:

Mistral: Roger's favorite, lauded for its capabilities and ability to run efficiently on smaller machines.
Phi-2: A tiny model from Microsoft ideal for on-device applications.
Llama: Meta's influential model, with Llama 2 currently leading the pack and Llama 3 anticipated to be comparable to ChatGPT 4.
Gemma: Google's new open-source model with potential, but still under evaluation.

Learn more:

Ollama: https://ollama.com/
Ollama source: https://github.com/ollama/ollama
LM Studio: https://lmstudio.ai/

Timestamps:

00:00:00: Introduction and welcome back to Roger Kibbe.

00:01:31: Roger discusses his career path and his passion for voice and AI.

00:06:33: The discussion turns to the larger vs. smaller LLMs.

00:13:52: Understanding key terminology like quantization and fine-tuning.

00:20:58: Roger shares his favorite local LLM models.

00:25:14: Discussing the strengths and weaknesses of smaller models like Gemma.

00:30:32: Exploring the benefits and challenges of running LLMs locally.

00:39:15: The value of local LLMs for developers and individual learning.

00:40:29: The impact of local LLMs on mobile devices and app development.

00:49:27: Closing thoughts and call for audience feedback.

Join Allen and Roger as they explore the exciting potential of local LLMs and how they might revolutionize the development landscape!

Mar 07, 202451:07

Episode 184 - Large Action Models: The Future of Conversational AI?

Join Allen and Mark on Two Voice Devs as they dive into the world of Large Action Models (LAMs) and explore their potential to revolutionize how we build chatbots and voice assistants.

Inspired by Braden Ream's article "How Large Action Models Work and Change the Way We Build Chatbots and Agents," the discussion dissects the core functions of conversational AI - understand, decide, and respond - and examines how LAMs might fit into this framework.

Allen and Mark also compare and contrast LAMs with Large Language Models (LLMs) and Natural Language Understanding (NLU), highlighting the strengths and limitations of each approach.

Tune in to hear their insights on:

The evolution of Voiceflow and its shift towards LLMs (03:20)
Understanding the core functions of conversational AI (05:40)
Clippy as an example of a deterministic agent (06:15)
The differences between deterministic and probabilistic models (07:50)
NLU vs. LLMs for understanding user input (09:20)
How LAMs might fit into the "decide" stage of conversational AI (18:50)
The challenges of training LAMs and avoiding hallucinations (20:00)
The potential of LAMs to improve response generation (29:30)
Cost considerations of using LLMs vs. NLUs (37:00)

Whether you're a seasoned developer or just curious about the future of conversational AI, this episode offers a thought-provoking discussion on the potential of LAMs and the challenges that lie ahead.

Be sure to share your thoughts in the comments below!

Additional Info:

https://www.voiceflow.com/blog/large-action-models-change-the-way-we-build-chatbots-again

Mar 01, 202439:14

Episode 183 - Gemini 1.5: One Million Tokens, Endless Possibilities? 🤯

Google's Gemini 1.5 is here, boasting a mind-blowing 1 million token context window! 🤯 Join Allen and Linda as they dive deep into this experimental AI, exploring its capabilities, limitations, and potential use cases. 🤔

They share their experiences testing Gemini 1.5 with original content, including Two Voice Devs transcripts and synthetic videos, and discuss the challenges of finding data that hasn't already been used to train the AI. 🧐

Get ready for a lively discussion on hallucinations, the future of content creation, and the ethical questions surrounding these powerful language models. 🤖

More info:

* https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

* https://developers.googleblog.com/2024/02/gemini-15-available-for-private-preview-in-google-ai-studio.html

* https://openai.com/sora

Timestamps:

00:00:00 Introduction

00:01:05 Notable features of Gemini 1.5

00:02:57 What is a token?

00:06:39 Linda's test with Danish citizenship PDF

00:09:33 Allen's test with Les Miserables and needle in a haystack

00:12:27 Testing with Data Portability API data

00:14:28 Linda's test with YouTube search history and Netflix recommendations

00:17:44 Allen's test with Two Voice Devs transcripts

00:21:32 Issues with counting and hallucinations

00:24:21 Testing with OpenAI's Sora AI synthetic videos

00:30:05 Ethical questions and the future of content creation

00:31:50 Potential use cases for large context windows

00:36:34 API limitations and challenges

00:37:39 Performance and cost considerations

00:41:34 Comparison with retrieval augmented generation and vector databases

00:44:21 Generating summaries and markers from this transcript

Leave your thoughts and questions in the comments below!

Feb 23, 202445:20

Episode 182 - Bard Becomes Gemini: Why Devs Care

In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss Gemini, Google's latest name for its Generative AI... stuff. Originally known as separate products including Bard and Duet AI, Gemini encompasses a suite of AI tools, including chatbots, product-specific assistants, models, and APIs that developers can use for various tasks. The discussion covers how Gemini compares with offerings from other companies such as OpenAI and Microsoft, including visible similarities and differences. The show concludes by answering the question about why developers should care about this rename with a call to explore possibilities with AI tools like Gemini to let us create more natural and user-friendly interfaces.

Learn more:

https://blog.google/technology/ai/google-gemini-update-sundar-pichai-2024/
https://blog.google/products/gemini/bard-gemini-advanced-app/

00:04 Introduction and Catching Up

00:55 Exploring the Gemini Model

04:09 Gemini vs OpenAI: A Comparison

10:20 Understanding the Gemini Branding

12:00 The Developer's Perspective on Gemini

17:46 Closing Thoughts and Future Discussions

Feb 09, 202418:58

Episode 181 - Let Your Web Pages Talk With CSS

In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss the CSS Speech Module Level 1 Candidate Recommendation Draft, a standard that enables webpages to talk, developed in collaboration with the voice browser activity. They explore its features including the 'aural' box model concept, voice families, earcons and more, drawing parallels with SSML and highlight its innovative approach to web accessibility complementing screen readers. Despite acknowledging its potential, they address some of its key omissions such as phonemes and the lack of a background audio feature.

00:04 Introduction and Welcome

01:14 Exploring the Concept of Webpages Talking

03:00 Deep Dive into CSS Speech Module

03:48 Understanding the Scope of CSS Speech Module

04:27 The Evolution of Voice Interaction

05:22 Comparing CSS Speech with SSML

07:13 The Power of CSS in Voice Development

22:49 The Impact of Voice Balance Property

29:20 The Limitations of CSS Speech

39:37 The Future of CSS Speech

42:50 Conclusion and Final Thoughts

Feb 02, 202443:06

Episode 180 - Run Rabbit One

Forget Apps! Talking to this Orange Cube Could Change Everything

Is the app model broken? The creators of Rabbit R1, a new voice-first device, certainly think so. In this episode of Two Voice Devs, Mark and Allen break down this innovative device and its potential to change how we interact with technology. What do developers think about the technology underlying RabbitOS? You may be surprised!

Key topics:

00:02:00 - What is the Rabbit R1? Rabbit R1 is a new type of device that prioritizes voice input and output. It aims to shift users away from apps and toward a more conversational way of interacting with technology.
00:05:17 - AI models: Rabbit uses a unique "large action model" to understand and complete tasks. It claims to do this faster and more intuitively than existing voice assistants.
00:14:14 - Teach Me mode: See how Rabbit can be trained to interact with new websites and applications. What implications does this have for the future?
00:18:41 - Can it replace apps? While that's a bold claim, Rabbit's conversational approach and innovative features show promise. Could this be the first step towards a new era in human-computer interaction?

Additional thoughts:

00:25:06 - Hybrid approach: Rabbit smartly combines intent-based and language-based AI models, potentially offering speed and accuracy.
00:32:56 - Asynchronous interactions: It breaks away from the traditional request-response model, offering a more natural conversational experience that aligns with the Star Trek computer vision.
00:07:48 - Price: At just $199, many people are willing to check it out, and this could accelerate interest in voice-driven interfaces.

Is Rabbit R1 a game-changer or just a gimmick? Let us know your thoughts in the comments!

Jan 29, 202447:24

Episode 179 - What's New With APL 2023.3

In this episode of 'Two Voice Devs', hosts Allen Firstenberg and Mark Tucker discuss updates made to Alexa Presentation Language (APL) version 2023.3. They highlight conditional imports, updates made for animations, and more, including APL support for different devices and how to "handle" backward compatibility.

Learn More:

https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apl-latest-version.html

00:08 Introduction and Welcome

00:17 Alexa Presentation Language (APL) Overview

01:02 Understanding APL and its Components

03:23 Exploring APL's Functionality and Usage

05:22 APL's Versioning Strategy and Device Compatibility

09:23 New Features in APL 2023.3: Conditional Imports

15:22 New Features in APL 2023.3: Item Insertion and Removal Commands

18:05 New Features in APL 2023.3: Control Over Scrolling and Paging

19:43 New Features in APL 2023.3: Accessibility Improvements

20:36 New Features in APL 2023.3: Frame Component Deprecation

22:23 New Features in APL 2023.3: Data Property for Sequential and Parallel Commands

25:07 New Features in APL 2023.3: Support for Variable Sized Viewports

26:47 New Features in APL 2023.3: Support for Lottie Files

28:33 New Features in APL 2023.3: String Functions and Vector Graphic Improvements

30:11 New Features in APL 2023.3: Extensions and APL Cheat Sheets

37:26 Strategies for Backwards Compatibility in APL

38:40 Conclusion and Farewell

Jan 12, 202439:28

Episode 178 - Looking Forward to 2024

In their New Year's discussion, Mark and Allen explore their hopes and predictions for technological advancements in 2024. They discuss the future of Large Language Models (and if that's the right name for them now), expressing anticipation for improvements in latency issues and the potential for models to be hosted on devices rather than cloud-based platforms. The conversation also ventures into the world of AI agents, function calling, and the importance of developers in ensuring safety measures are integrated in AI systems. Finally, they exude excitement about the possibility of AI in multimedia formats, where tools can generate differing output forms like text, video, images, and possibly even audio directly. They explore potential developer opportunities and challenges, emphasizing the importance of understanding regulations and ensuring user privacy and safety.

00:04 Introduction and New Year Reflections

02:05 Looking Forward: Predictions for 2024

02:14 The Future of Large Language Models (LLMs)

03:08 The Impact of LLMs on Voice Assistants

07:44 The Potential of On-Device AI Models

10:14 The Role of Developers in the AI Landscape

20:11 The Future of Multimodal AI Models

26:35 The Importance of Regulations in AI

29:22 Conclusion: Exciting Times Ahead

Jan 05, 202430:22

Episode 177 - Looking Back at 2023

Allen Firstenberg and Mark Tucker, hosts of Two Voice Devs, reflect on the year 2023, discussing significant changes and trends in the #VoiceFirst and #GenerativeAI industry and where their predictions from last year were accurate... or fell short. They discuss the transformation and challenges Amazon faced, gleaning predictions from hints at large language models (LLMs) from Google, Amazon, Microsoft, and Apple. They also mention the shift of Voiceflow towards LLMs and recall the notion of retrieval augmented generation.

00:04 Introduction and Welcome

00:12 Reflecting on the Past Year

01:13 Amazon's Progress and Challenges

01:59 Exploring Amazon's Monetization and Widgets

08:45 Google's Journey and the End of Conversational Actions

11:53 The Rise of Large Language Models (LLMs)

17:04 The Impact of Voiceflow and Dialogflow

20:48 Closing Remarks and New Year Wishes

Dec 29, 202321:36

Episode 176 - The Night Before Tech-mas

Mark and Allen get into the Tech-mas spirit, with a little help from Bard.

Hoping you all have the happiest of holiday seasons.

#GenerativeAI #VoiceFirst #ConversationalAI #HappyHolidays

Dec 21, 202303:40

Episode 175 - Gemini: A First Look

In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future.

00:04 Introduction and Welcome

00:23 Discussing the New Gemini Model

01:33 Comparing Gemini and Bison Models

02:07 Exploring Gemini's Vision Model

03:03 Gemini's Response Quality and Speed

03:53 Gemini's Token Length and Context Window

05:05 Gemini's Pricing and Google AI Studio

05:33 Upcoming Projects and Previews

06:16 Gemini's Role in Code Generation

07:54 Gemini's Model Variants and Limitations

12:01 Creating a Python Desktop App with Gemini

14:07 Gemini's Potential for Assisting the Visually Impaired

18:35 Gemini's Ability to Reason and Count

20:15 Gemini's Multi-Step Reasoning

20:33 Testing Gemini with Multiple Images

21:52 Exploring Image Recognition Capabilities

22:13 Discussing the Limitations of 3D Object Recognition

23:53 Testing Image Recognition with Personal Photos

24:52 Potential Applications of Image Recognition

25:45 Exploring the Multimodal Capabilities of the AI

26:41 Discussing the Challenges of Using the AI in Europe

27:26 Exploring the AQA Model and Its Potential

33:37 Discussing the Future of AI and Image Recognition

37:12 Wishlist for Future AI Capabilities

40:11 Wrapping Up and Looking Forward

Dec 15, 202341:39

Episode 174 - Live and In Person at Voice+AI 2023

Join Allen Firstenberg and guest host Noble Ackerson, at the Voice and AI 2023 conference. They discuss the growth of AI and how LLM (large language models) are affecting the tech world and delve deep into topics like LangChain, generative AI, and how to optimize AI operations to tackle network latency. There are also plenty of audience questions, exploring the current challenges in AI and potential solutions.

00:03 Introduction and Background of Two Voice Devs

00:31 The Evolution of Voice Technology and AI

01:50 Interactive Q&A Session Begins

01:58 Discussion on Open Source Software and Generative AI

02:59 Deep Dive into LangChain

05:43 Audience Participation and Questions

06:00 Challenges with LangChain and Overhead

08:14 Exploring the Intersection of Voice Technology and Generative AI

12:51 Addressing Network Latency in Voice Technology

19:49 The Future of AI and Voice Technology

26:53 Addressing the Challenges of Network Latency

37:13 Closing Remarks and Future Engagements

Dec 08, 202337:48

Episode 173 - Thanksgiving Thoughts 2023

Join Mark Tucker and Allen Firstenberg on Thanksgiving Day for a sincere heart-to-heart on the highs and lows of their tech industry journey. Expressing their gratitude for their family, friends, and colleagues in the tech industry and beyond, they acknowledge the challenging times faced by many. They call on their viewers to remember how unique and important they are and invite them to express their thoughts and emotions openly by reaching out to them.

00:04 Introduction and Thanksgiving Greetings

00:28 Reflecting on the Past Year

02:19 Gratitude for Personal Relationships

03:54 Acknowledging Industry Challenges and Layoffs

05:59 Importance of Community and Support

07:59 Encouragement and Closing Remarks

Nov 23, 202308:49

Episode 172 - VoiceFlow Changes and Solutions

Mark Tucker and Allen Firstenberg delve into the recent changes made by VoiceFlow. We explore how VoiceFlow, originally a design resource for Alexa Skills and Google Assistant Actions, has evolved and shifted to include chatbot roles and generative AI responses. Highlighted too are the implications of VoiceFlow's decoupling and transition to 'bot logic as a service'. We look at the necessary technical adjustments and solutions required in the aftermath of these changes, and Mark shares how he created a Jovo plugin as a hassle-free 'integration layer' for handling multiple platforms, taking advantage of Jovo's generic input output.

More info:

https://github.com/jovo-community/jovo4-voiceflowdialog-app

00:04 Introduction

00:54 Introducing VoiceFlow

01:44 Exploring VoiceFlow's Evolution

03:13 Understanding VoiceFlow's Changes

05:39 Explaining the VoiceFlow Integration

14:39 Discussing the VoiceFlow Dialog API

25:42 Conclusion

Nov 16, 202326:05

Episode 171 - Ups and Downs of the OpenAI DevDay Roller Coaster

On this episode, Mark Tucker and Allen Firstenberg dive deep into the latest announcements by OpenAI. They discuss various developments including the launch of GPTs (collections of prompts and documents with configuration settings), the new text-to-speech model, upcoming GPT-4 Turbo, reproducible outputs, and the introduction of the Assistant API. While they express excitement for what these developments could mean for #VoiceFirst, #ConversationAI, and #GenerativeAI, they also voice concerns about discovery solutions, monetization, and the reliance on platform-based infrastructure. Tune in and join the conversation.

More info:

https://openai.com/blog/new-models-and-developer-products-announced-at-devday

00:04 Introduction and OpenAI Announcements Edition

00:52 Discussion on OpenAI's New Text to Speech Model

02:15 Exploring the Pricing and Quality of OpenAI's Text to Speech Model

02:52 Concerns and Limitations of OpenAI's Text to Speech Model

06:24 Introduction to GPT 4 Turbo

06:48 Benefits and Limitations of GPT 4 Turbo

09:27 Exploring the Features of GPT 4 Turbo

18:52 Introduction to GPTs and Their Potential

22:22 Concerns and Questions About GPTs

32:14 Discussion on the Assistant API

37:32 Final Thoughts and Wrap Up

Nov 10, 202339:52

Episode 170 - At the Hub of MakerSuite and LangChain

Allen and Mark discuss the practical uses and advantages offered by MakerSuite, an API currently available for Google's PaLM #GenerativeAI model. We look at its unique feature that treats prompts like templates, allowing for versatile manipulation of these templates for varying results. We further delve into how it saves these prompts in Google Drive and how this can be linked to LangChain's new hub concept, leading to an effective 'MakerSuite hub.' Finally, we explore if prompts are more like code or content, and how that fits into the development process. What do you think?

More info:

MakerSuite: https://makersuite.google.com/
MakerSuite Hub in LangChain JS: https://js.langchain.com/docs/ecosystem/integrations/makersuite

Nov 02, 202318:22

Episode 169 - First Thoughts on TypeChat

Mark and Allen explore TypeChat - a new library from Microsoft that makes prompt engineering for function-like operations in #ConversationalAI easier and more robust. Is this a replacement for Intents? Does it go beyond what we could do with Intent-based systems? Is it lacking something? Let's explore!

Learn more:

https://github.com/microsoft/TypeChat

Nov 02, 202327:43

Episode 168 - Defining Retrieval Augmented Generation

What started as a casual conversation between Mark and Allen turned into a brief exploration of what Retrieval Augmented Generation (RAG) means in the #GenerativeAI and #ConversationalAI world. Toss in some discussion about VoiceFlow and Google's Vertex AI Search and Conversation and we have another dive into the current hot method to bridge the Fuzzy Human / Digital Computer divide.

Oct 20, 202313:30

Episode 167 - What Does Bard Have to Say to Devs?

Last week, before Google's annual hardware event, Allen teased part of his prediction about Google Assistant and Bard. This week, we'll show the full clip of Allen's prediction and see just how close he was. Then Mark and Allen discuss how recent announcements from OpenAI, Amazon Alexa, and Google compare to each other and, more important, what they each mean for developers in a #GenerativeAI, #ConversationalAI, and perhaps even a #VoiceFirst world, and perhaps make a few more predictions and what we'll hear next.

More info:

Blog post about Assistant With Bard: https://blog.google/products/assistant/google-assistant-bard-generative-ai/
Announcement at the the Made By Google event: https://www.youtube.com/live/pxlaUCJZ27E?si=I1noN-l3LQHgBktp&t=2941

Oct 12, 202332:34

Episode 166 - What's Next at Google Cloud Next 2023

The Google Cloud Next conference is a massive display of the latest technologies and products available from Google Cloud - from AI to Zero-Trust solutions. Unsurprisingly, #MachineLearning was prominent in this years show, so Mark and Allen take a look at some of the biggest #GenerativeAI and #ConversationalAI announcements this year.

More info:

https://cloud.google.com/blog/topics/google-cloud-next/next-2023-wrap-up

Oct 06, 202335:05

Episode 165 - Speaking of LLMs and Alexa...

Mark shares the exciting news that Amazon Alexa will soon have a #VocieFirst #ConversationalAI LLM chat mode! While Allen agrees that this is very exciting news, he still has quite a few questions about how #GenerativeAI technology will fit into Alexa skills. We ask the difficult questions and see what answers are currently out there.

What do you think about this announcement from Alexa?

More info:

Sep 28, 202342:31

Episode 164 - VOICE + AI 2023 Recap

Noble and Allen take a look back at our experiences at this years VOICE + AI conference. What were the big topics being discussed? The amusing moments? And what do we want to see next year?

#GenerativeAI #ConversationalAI #VoiceFirst

Sep 26, 202337:18

Episode 163 - Using Google's MakerSuite PaLM API for Analytics

Allen and guest host Linda have a wide ranging conversation, from Linda's career path and her experiences as a Google Developer Expert for Google Analytics, to how she leveraged that knowledge while trying out something new with Google's #GenerativeAI tool, MakerSuite and the PaLM API. We take a close look at how developers can use prompts (more than one!) to help turn a user's request into actionable data structures that feed into an API and get results.

Episode 162 - Previewing Voice+AI 2023

We're just days away from the annual VOICE+AI conference, hosted this year in Washington, DC. Both Allen and Noble will be speaking (and hosting a live and in person recording of a future episode!), so we'll give a little preview of what you can hear if you're attending.

Sep 01, 202327:37

Episode 161 - LangChain JS + Matching Engine = ?

Allen and Mark revisit a conversation from episode 146 where they discovered Google had a Vector Database. Now, several months later, Allen has done some work with the Google Cloud Vertex AI Matching Engine and incorporated it into LangChain JS. We discuss why this is important, and how it fits into the overall landscape of LLMs and MLs today. (And Allen has a little announcement towards the end.)

More info:

* Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine/overview

* LangChain JS: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/googlevertexai

Aug 24, 202326:47

Episode 160 - So You Downloaded an LLM. Now what?

This seems like an easy question, right? If you want to do #ConversationalAI or #GenerativeAI on your own machine with a model such as Llama 2, you can just download the model and... well... then what? This is the question posed to guest host Noble Ackerson - and the answer was both more complicated and simpler than Allen could imagine!

Aug 17, 202342:47

Episode 159 - What's New With APL 2023.2?

Amazon has made some changes to the Alexa Presentation Language, dubbing this version 2023.2, and Allen is a bit confused about what these updates bring. Mark, however, clarifies what's new, how it relates to what was previously available, and why some users can benefit from this latest APL release.

Aug 10, 202336:44

Episode 158 - Picture an Embedding, If You Will

One of the neat features we've seen come out of the #GenerativeAI and #ConversationalAI explosion recently has been the attention being paid to text embeddings and how they can be used to radically change how we index and search for things. Allen, however, has recently been working with an image embedding model from Google, including incorporating it into LangChain JS. Mark asks about what that process was like, what this new model lets us do, and starts to explore some of the potential of this new tool that is available for everyone.

References:

LangChain JS module: https://js.langchain.com/docs/modules/data_connection/experimental/multimodal_embeddings/google_vertex_ai
Information from Google: https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-image-embeddings
Google Model Garden info: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/5
XKCD: https://xkcd.com/1425/

Aug 03, 202338:38

Episode 157 - Three Years... and Still Going!

Three years of Two Voice Devs! There's no doubt that the #VoiceFirst industry has changed over that time, with the rise of #GenerativeAI and #ConversationalAI taking the world by storm. Mark and Allen look back at how the show has evolved over this time, and why we hope you'll be joining us as we continue forward on our journey!

Jul 27, 202317:43

Episode 156 - Go with the Dialogflow CX Flow

Guest Host Xavier Portilla returns to chat with Allen about some of the latest additions to Dialogflow CX. New system functions make some of the processing you can do on inputs easier and faster, while prebuilt flows and flow scoped parameters make it easier to have clearly defined, and reusable, components in your conversation design.

More info:

https://cloud.google.com/dialogflow/docs/release-notes#July_05_2023

Jul 20, 202322:14

Episode 155 - New Alexa Slot Type is Wild!

Guest host Xavier Portilla joins Allen to take a look at a new slot type that the Alexa team has in public beta. How can this new type be used? How does it differ from previous slot types? And what is a slot type anyway?

Jul 13, 202314:18

Episode 154 - The Philosophical Developer

Guest Host Leslie Pound joins Allen to discuss her perspective on software development and #GenerativeAI and how, rather than trying to translate our fuzzy side, developers should think about how it helps us be more aware of how users are seeking to be more inspired or creative.

Jul 06, 202334:34

Episode 153 - Between Fuzzy and Discrete With LLMs

Noble Ackerson returns to discuss about a recent presentation that Allen made to the Google Developer Group NYC chapter where he illustrates how #GenerativeAI can be used as a bridge between the discrete nature of computers and the "fuzzy" nature of humans. He and Noble discuss how Large Language Models, such as OpenAI and Google's PaLM 2, along with libraries like LangChain become a powerful tool in every developer's toolbox.

Jun 29, 202332:15

Episdoe 152 - What's the Intent of OpenAI Functions?

Allen is joined by Noble Ackerson to discuss the latest feature that OpenAI has included with it's GPT models. Functions provide a well defined way for developers to turn unstructured human input to a more structured format that can be processed by your code or using a library such as LangChain. We take a look at both how they can be used, but some of the open questions that remain about their use.

More info:

- https://platform.openai.com/docs/guides/gpt/function-calling

Jun 27, 202330:55

Episode 151 - Requiem for Conversational Actions

This week, Google completed the "sunset" of Conversational Actions for the Google Assistant. Mark and Allen discuss the ups and downs of Actions on Google, how it fit into the #VoiceFirst landscape, and what may come next.

Jun 15, 202337:41

Episode 150 - Another Look Backwards and Forwards

Another milestone episode! Mark and Allen take advantage of the event to look back at our predictions from episode 100, look back at how #VoiceFirst development has changed over the past 50 episodes (and several years), and look forward to what we'll be talking about in the next 50 episodes.

Jun 08, 202325:35

Episode 149 - Recent Projects: Cards and Chains

It's been a busy week! What have we been up to? Mark has released a new set of cards that summarize and illustrate different AI concepts. Called "AI Explorer Cards of Discovery", we chat about the objectives and the process to create this deck. (And there's a special offer for listeners!) Meanwhile, Allen has been working with Google's new PaLM model as part of Google Cloud's Vertex AI platform and has contributed changes to the popular LangChainJS package to make PaLM available through the open source library.

Resources:

* AI Explorer Cards of Discovery: https://bit.ly/ai-cards

* LangChainJS: https://github.com/hwchase17/langchainjs

* Google PaLM: https://cloud.google.com/ai/generative-ai

Jun 01, 202330:17

Episode 148 - AI Voodoo With Vodo Drive

SO MUCH packed into this episode!

Recently, Allen participated in a hackathon sponsored by VoiceFlow, and he used the opportunity to explore ways that LLMs could be used to build on his work talking with spreadsheets in Vodo Drive (see episode 116). He and Mark explore how he did it - from the prompts that were required to integration with VoiceFlow and Google App Script, to how tools like LangChain will help build similar things. We also explore what lessons are learned, how our experience in #VoiceFirst design helps us build good #ConversationalAI tools, how other APIs can (and should!) work alongside AI, and what "fuzzy" roles AI can fill in the modern app experience.

Resources:

* Vodo Drive: https://vodo-drive.com/

* PromptHacks Hackathon: https://prompthacks.devpost.com/

* Vodo AI submission for PromptHacks: https://devpost.com/software/vodo-ai

* VoiceFlow: https://www.voiceflow.com/

* Google Apps Script: https://www.google.com/script/start/

* LangChain: https://github.com/hwchase17/langchain and https://github.com/hwchase17/langchainjs

May 25, 202356:56

Episode 147 - Google AI/O 2023 Recap

It's Google I/O time again! And although Allen couldn't attend in person, he and Mark review the latest announcements relevant to #VoiceFirst and #ConversationalAI developers. From new AI availability to AI workspace, with stops along the way to discuss AI powered hardware, there was lots to hear about. Also some subtle hints from what wasn't said. But did we mention the AI?

Learn more:

* https://blog.google/technology/developers/google-io-2023-100-announcements/

May 18, 202336:17

Episode 146 - Visions of Vector Databases

We've touched on the use of vector databases as we've started to explore how LLMs and conversational AIs can be useful, but what are they and how do they work? How are they used for more than just LLMs? Mark and Allen explore some of the classic vector DBs, such as HNSW, and some of the newer fully managed ones, including Metal and Pinecone. We even start to ponder what a fully managed embedding and vector db system might look like from the likes of Google, Azure, or AWS, and are surprised that we're closer than we thought!

Resources:

* HNSWlib: https://github.com/nmslib/hnswlib

* Pinecone: https://pinecone.io/

* Metal: https://getmetal.io/

* Google Cloud Vertex AI Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine/overview

* Amazon AWS Bedrock: https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/

May 11, 202343:29

Episode 145 - Alexa Widgets are Here!

Long teased, the ability for developers to create Alexa Widgets is finally generally available! Mark, an Alexa Champion, has had access for a while now, so he and Allen discuss what it takes to make a Widget, what's new and different, and how it fits into the #VoiceFirst world of skills.

May 04, 202336:04

Episode 144 - Experiments With LangChain (Part 2)

We're still exploring what LangChain can do, and this week we dive into a tutorial put out by the Voiceflow team that discusses some ways that it can be integrated with ChatGPT using LangChain, bringing the #VoiceFirst and #ConversationalAI worlds closer together. Also a great example of how we go about learning and understanding code that is new to us.

Resources:

* The tutorial we were following: https://www.voiceflow.com/blog/voiceflow-assistant-openai-gpt

Apr 27, 202347:01

Episode 143 - Experiments With LangChain (Part 1)

Over the past few weeks, Mark and Allen have been playing with LangChain and OpenAI, exploring where #ConversationalAI and #VoiceFirst design intersects, and we recorded some of our experiments. In this early one, we take a look at how LangChain with a memory chain can work and keep track of what's going on in the conversation. All in just a few lines of code. More significantly, we discuss the role that LangChain can play in putting together AI and other API components to create voice, web, and app-based agents that include AI as part of the NLU or response elements.

Apr 20, 202315:17