
Two Voice Devs
By Mark and Allen


Episode 240: I/O Eyewear - From Google Glass to Gemini
The buzz from Google IO 2025 is deafening, especially about the new smart glasses announcement! On this episode of Two Voice Devs, Allen Firstenberg and Noble Ackerson — former Google Glass Explorers themselves — dive deep into their first impressions of Google's Project Astra / Android XR / Gemini glasses prototype.
Drawing on their unique experience from the early days of Glass, Allen and Noble discuss the evolution of wearable computing, the collision of conversational AI (Gemini) and spatial computing (Android XR), and what this new device means for the future.
They share their thoughts on the hardware design, the user interface (is it Gemini, Android XR, or both?), and critically examine the product strategy compared to Glass and other devices like the Apple Vision Pro. Most importantly for developers, they ponder the crucial question: what is the developer story here? Is Google providing the necessary tools and documentation, or are we repeating past mistakes?
Tune in for a candid, experienced perspective on Google's latest foray into smart glasses and whether this iteration truly builds on the lessons learned from the past.
0:00:30 - Introduction: Google IO buzz and the glasses question
0:01:16 - Remembering Google Glass: First impressions & the "art of the possible"
0:02:35 - From Glass to Assistant: The evolution of ubiquitous computing
0:03:42 - The Collision: Conversational AI meets Spatial Computing
0:03:58 - First Impressions: Trying on the new Google glasses prototype at IO
0:04:25 - How Glass Shaped Us: Focusing on human factors and product strategy
0:05:44 - The "If You Build It They Will Come" Trap: Why problem-solving is key
0:07:48 - Contrasting with Apple Vision Pro & the "Start with VR" concern
0:09:14 - Breaking Down the Stack: Hardware, Android XR, and Gemini
0:14:24 - Hardware Deep Dive: Weight, balance, optics, and the lower display decision
0:18:38 - UI/Interaction Discussion: Gemini's role, gestures, voice/tap inputs
0:19:37 - The Developer Story: Lack of clarity and need for APIs/documentation
0:27:55 - Rapid Fire: Best thing & Biggest Irk point about the prototype
0:32:16 - The Big Question: Would we buy one today?
0:33:08 - Final Thoughts: Value proposition and learning from Glass
#AndroidXR #Gemini #GoogleGlass #GoogleIO #IO2025 #ProjectAstra #SmartGlasses #WearableTech #SpatialComputing #ConversationalAI #VoiceFirst #VoiceDevs #GlassExplorers #TechPodcast #DeveloperLife #HumanComputerInteraction #ProductStrategy #Google #GoogleDeepMind #DeepMind

Episode 239 - MCP: Hype, Security, and Real-World Use
Join us on Two Voice Devs as Allen Firstenberg talks with Rizel Scarlett, Tech Lead for Open Source Developer Relations at Block. Rizel shares her fascinating journey from psychology student to software engineer and now a leader in developer advocacy, highlighting her passion for teaching and creative problem-solving.
The conversation dives deep into Block's innovative open source work, particularly their AI agent called Goose, which leverages the Model Context Protocol (MCP). Rizel explains what MCP is, seeing it as an SDK or API for AI agents, and discusses the excitement around its potential to democratize coding and other tools for developers and non-developers alike, sharing compelling use cases like automating tasks in Google Docs and interacting with Blender.
However, the discussion doesn't shy away from the critical challenges facing MCP, especially concerning security. Rizel addresses concerns about trusting community-built MCP servers, potential vulnerabilities, and mitigation strategies like allow lists and building internal, vetted servers. They also explore the complexities of exposing large APIs, the demand for local AI for privacy, the current limitations of local models, and the user experience of installing and trusting MCP plugins.
Rizel shares examples of promising MCP servers, including those focused on "long-term memory" and, notably, a speech/voice-controlled coding server, bringing the conversation back to the show's roots in voice development and accessibility, touching upon the concept of temporary disability.
The episode concludes by reflecting on whether MCP is currently a "small, beginner solution" being hyped as a "massive, full-featured" one, the need for more honest conversations about its limitations, and the ongoing efforts within the community and companies like Block to improve the protocol, including discussions around official registries and easier installation methods like deep links.
Tune in for a candid look at the exciting, yet challenging, landscape of AI agents, MCP, and open source development.
More Info:
* Goose - https://github.com/block/goose
* Pieces for Developers - https://pieces.app/features/mcp
* Speech MCP - https://glama.ai/mcp/servers/@Kvadratni/speech-mcp
[00:00:48] Meet Rizel Scarlett & Her Career Journey (Psychology to Dev Advocacy)
[00:03:54] Introducing Block & Its Mission (Square, Cash App, etc.)
[00:04:58] Block's Open Source Division and the Goose AI Agent
[00:05:48] Diving into the Model Context Protocol (MCP)
[00:07:56] What is MCP? (SDK for Agents) & Exciting Use Cases (Democratization, non-developers)
[00:10:36] Major Security Concerns with MCP (Trust, vulnerabilities, typo squatting)
[00:11:48] Mitigation Strategies & Authentication (Allow Lists, Internal Servers, Vetting)
[00:17:59] The Current State of MCP: An Infancy Protocol
[00:20:09] Complexity & Context Window Challenges with MCP Servers
[00:23:14] User Demand for Local AI & Data Privacy
[00:25:31] User Experience of MCP Plugin Installation & Trust
[00:28:42] Examples of Useful MCP Servers (Pieces, Computer Controller, Speech)
[00:31:18] The Power of Voice-Controlled Coding (Accessibility, temporary disability)
[00:33:59] MCP: Hype vs. Reality & The Need for Honest Conversations
[00:36:00] Efforts to Improve MCP (Committees, Registries, Deep Links)
#developer #programming #tech #opensource #block #ai #aigent #llm #mcp #modelcontextprotocol #devrel #developeradvocacy #security #cybersecurity #privacy #localai #remoteai #accessibility #voicecoding #riselscarlett #gooseai

Episode 238 - LLM Benchmarking: What, Why, Who, and How
How do you know if a Large Language Model is good for your specific task? You benchmark it! In this episode, Allen speaks with Amy Russ about her fascinating career path from international affairs to data, and how that unique perspective now informs her work in LLM benchmarking.
Amy explains what benchmarking is, why it's crucial for both model builders and app developers, and how it goes far beyond simple technical tests to include societal, cultural, and ethical considerations like preventing harms.
Learn about the complex process involving diverse teams, defining fuzzy criteria, and the technical tools used, including data versioning and prompt template engines. Amy also shares insights on how to get involved in open benchmarking efforts and where to find benchmarks relevant to your own LLM projects.
Whether you're building models or using them in your applications, understanding benchmarking is key to finding and evaluating the best AI for your needs.
Learn More:
* ML Commons - https://mlcommons.org/
Timestamps:
00:18 Amy's Career Path (From Diplomacy to Data)
02:46 What Amy Does Now (Benchmarking & Policy)
03:38 Defining LLM Benchmarking
05:08 Policy & Societal Benchmarking (Preventing Harms)
07:55 The Need for Diverse Benchmarking Teams
09:55 Technical Aspects & Tooling (Data Integrity, Versioning)
10:50 Prompt Engineering & Versioning for Benchmarking
12:48 Preventing Models from Tuning to Benchmarks
15:30 Prompt Template Engines & Generating Prompts
17:10 Other Benchmarking Tools & Testing Nuances
19:10 Benchmarking Compared to Traditional QA
21:45 Evaluating Benchmark Results (Human & Metrics)
23:05 The Challenge of Establishing an Evaluation Scale
23:58 How to Get Started in Benchmarking (Volunteering, Organizations)
25:20 Open Benchmarks & Where to Find Them
26:35 Benchmarking Your Own Model or App
28:55 Why Benchmarking Matters for App Builders
29:55 Where to Learn More & Follow Amy
Hashtags:
#LLM #Benchmarking #AI #MachineLearning #GenAI #DataScience #DataEngineering #PromptEngineering #ModelEvaluation #TechPodcast #Developer #TwoVoiceDevs #MLCommons #QA

Episode 237 - Building Bridges with Developers
Join Allen Firstenberg from Google Cloud Next 2025 as he sits down with Ankur Kotwal, Google's Global Head of Cloud Advocacy. In this episode of Two Voice Devs, Allen and Ankur dive deep into the world of Developer Relations (DevRel) at Google, discussing its crucial role as a bridge connecting Google's product teams and engineers with the global developer community.
Ankur shares his fascinating personal journey, from coding BASIC as a child alongside his developer dad to leading a key part of Google Cloud's developer outreach. They explore the ever-evolving landscape of technology, using the metaphor of "waves" – from early desktop computing and the internet to mobile apps and the current tidal wave of AI and "vibe coding."
This conversation offers valuable insights for all developers navigating the pace of technological change. Discover what Developer Relations is and how it serves as that essential bridge, functioning bidirectionally (both outbound communication and inbound feedback). Learn about the importance of community programs like Google Developer Experts (GDEs), and how developers can effectively connect with DevRel teams to share their experiences and help shape the future of products. Ankur and Allen also reflect on the need for continuous learning, understanding underlying tech layers, and the shared passion that drives innovation in our industry.
Whether you're a long-time developer or just starting out, learn how to ride the waves, connect with peers, and make your voice heard in the developer ecosystem by engaging with the DevRel bridge.
More Info:
* Google Developers Program: https://goo.gle/google-for-developers
Timestamps:
00:49 - Ankur's Role as Global Head of Cloud Advocacy
01:48 - The Bi-directional Nature of Developer Relations
02:34 - Ankur's Journey into Tech and DevRel
09:47 - What is Developer Relations? (The DevRel Bridge Explained)
12:06 - The Value of Community and Google Developer Experts (GDEs)
14:08 - Allen's Motivation for Being a GDE
18:24 - Riding the Waves of Technological Change (AI, Vibe Coding)
20:37 - The Importance of Understanding Abstraction Layers
25:41 - How Developers Can Engage with the DevRel Bridge
30:50 - Providing Feedback: Does it Make a Difference?
Hashtags:
#DeveloperRelations #DevRel #GoogleCloud #CloudAdvocacy #DeveloperCommunity #TechEvolution #AI #ArtificialIntelligence #VibeCoding #GoogleGemini #SoftwareDevelopment #Programming #Google #GoogleCloudNext #GoogleDevRel #GDG #GDE #TwoVoiceDevs #Podcast #Developers

Episode 236 - AI, Agents, and Sphere Magic Live from Cloud Next 2025
Join Allen Firstenberg and Alice Keeler, the Two Voice Devs, live from Day 1 of Google Cloud Next 2025 in Las Vegas! In this episode, recorded amidst the energy of the show floor, Allen and Alice dive into the major announcements and highlights impacting developers, especially those interested in AI and conversational interfaces.
Alice, known as the "Queen of Spreadsheets" and a Google Developer Expert for Workspace and App Sheet, shares her unique perspective on using accessible tools like App Script for real-world solutions, contrasting it with the high-end tech on display.
They unpack the new suite of generative AI models announced, including Veo for video, Chirp 3 for audio, Lyric for sound generation, and updates to Imagen, all available on Vertex AI. They recount the breathtaking private premiere at Sphere, discussing how Google DeepMind's cutting-edge AI enhanced the classic Wizard of Oz film, expanding and interpolating scenes that never existed – and connect this advanced technology back to tools developers can use today.
A major focus is the new Agent Builder, a tool poised to revolutionize how developers create multimodal AI agents capable of natural voice, text, and image interactions, demonstrated through exciting examples. They discuss the accessibility of this tool for developers of all levels and its potential to automate tedious tasks and create entirely new user experiences.
Plus, they touch on the new Agent to Agent Protocol for complex AI workflows, updates to AI Studio, and the production readiness of the Gemini 2.0 Live API.
Get a developer's take on the biggest news from Google Cloud Next 2025 Day 1 and a look ahead to the developer keynote.
More Info:
* Google Developers Program: https://goo.gle/google-for-developers
* Next 2025 Announcements: https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2025-wrap-up
00:00:31 Welcome to Google Cloud Next 2025
00:01:18 Meet Alice Keeler: Math Teacher, GDE, and App Script Developer
00:03:44 App Script: Accessible Development & Real-World Solutions
00:05:40 Cloud Next 2025 Day 1 Keynote Highlights
00:06:18 New Generative AI Models: Veo (Video), Chirp 3 (Audio), Lyric (Sound), Imagen Updates
00:09:00 The Sphere Experience & DeepMind's Wizard of Oz AI Enhancement
00:14:00 From Hollywood Magic to Public Tools: Vertex AI Capabilities
00:16:30 Agent Builder: The Future of AI Agents & Accessible Development
00:23:37 Agent to Agent Protocol: Enabling Complex AI Workflows
00:25:20 Other Developer News: AI Studio Revamp & Gemini 2.0 Live API
00:26:30 Connecting with Experts & Discovering What's Next
#GoogleCloudNext #GCNext #LasVegasSphere #SpehereLasVegas #TwoVoiceDevs #AI #GenerativeAI #VertexAI #Gemini #AgentBuilder #AppScript #Developers #LowCode #NoCode #AIInEducation #AIDevelopment #ConversationalAI #VoiceAI #MachineLearning #WizardOfOz

Episode 235 - A Developer's Dive into MCP
Following up on our recent conversation about the Model Context Protocol (MCP), Mark and Allen take a step deeper from a developer's perspective. While still in the shallow end, they explore the TypeScript SDK, the MCP Inspector tool, and the Smithery.ai registry to understand how developers define and host MCP servers and tools.
They look at code examples for both local (Standard IO) and potentially remote (Streamable HTTP) server implementations, discussing how tools, resources, and prompts are registered and interact. They also touch on the challenges of configuration, authentication, and the practical messy realities encountered when trying to use MCP tools in clients like Claude Desktop.
This code dive generates more questions than answers about the practical hosting models, configuration complexities, and the vision for MCP in the AI ecosystem. Is it the USBC of AI tools, or more like a 9-pin serial port needing detailed manual setup? Join Mark and Allen as they navigate the current state of MCP code and ponder its future role.
If you have insights into these complexities or are building with MCP, they'd love to hear from you!
00:40 Following up on the previous MCP episode
01:20 Reconsidering MCP's purpose and metaphors
03:25 Practical challenges with clients (like Claude Desktop) and configuration
05:00 Discussing future AI interfaces and app integration
09:15 Understanding Local vs. Remote MCP servers and hosting models
12:10 Comparing MCP setup to early web development (CGI)
13:20 Diving into the MCP TypeScript SDK code (Standard IO, HTTP transports)
23:00 Running a local MCP server and using the Inspector tool
23:50 Code walkthrough: Defining tools, resources, and prompts
31:15 Exploring remote (HTTP) connection options in the Inspector
32:30 Introducing Smithery.ai as a potential MCP registry
33:45 Navigating the Smithery registry and encountering configuration confusion
36:15 Analyzing server source code vs. registry listings - Highlighting discrepancies
44:30 Reflecting on the current practical usability and complexity of MCP
46:10 Analogy: MCP as a serial port vs. USBC
#ModelContextProtocol #MCP #AIDevelopment #DeveloperTools #Programming #TypeScript #APIs #ToolsForAI #LLMTools #TechPodcast #SoftwareDevelopment #TwoVoiceDevs #AI #GenerativeAI #Anthropic #Google #LangChain #Coding #AIAPI

Episode 234 - Decoding MCP: Revolution or Confusion?
Join Allen Firstenberg and Michal Stanislawek on Two Voice Devs as they dive deep into the Model Context Protocol (MCP), a proposition by Anthropic that's gaining traction in the AI landscape. What exactly is MCP, and is it the key to seamless integration of external services with large language models?
In this insightful discussion, Allen and Michal unravel the complexities of MCP, exploring its potential to solve integration pain points, its current implementation challenges with local "servers," and the crucial missing pieces like robust authentication and monetization. They also discuss the implications of MCP for AI applications, compare it to established protocols, and ponder its relationship with Google's newly announced Agent to Agent (A2A) protocol.
Is MCP a game-changer that will empower natural language interaction with all kinds of software, from Blender to Slack? Or are there fundamental hurdles to overcome before it reaches its full potential? Tune in to get a developer's perspective on this evolving technology and understand its possible future in the world of AI.
Timestamps:
00:00:55: What is MCP and what does it stand for?
00:02:35: What pain points is MCP trying to solve?
00:04:35: The local nature of current MCP "servers" and its implications.
00:07:15: MCP as a communication protocol and the concept of "tools."
00:08:35: The potential for MCP server discovery and the lack thereof currently.
00:10:25: Security and trust concerns with local MCP servers.
00:13:30: The intended architecture of MCP and the local server model.
00:16:35: The absence of built-in authentication and authorization in MCP.
00:18:35: MCP as a standardized framework and the "plugin" analogy.
00:20:35: MCP's role in defining "AI apps."
00:22:35: The need for a registry component for broader adoption.
00:23:35: What MCP clients currently exist and the breadth of adoption.
00:26:25: MCP and its application in the context of AI agents.
00:29:25: What is still needed for widespread adoption of remote MCP servers?
00:35:15: The concept of an MCP "meta server" or aggregator.
00:38:55: How does Google's Agent to Agent (A2A) protocol fit in?
00:41:45: The debate between MCP servers and specialized AI agents.
00:43:15: The right level of abstraction for tool definitions.
00:46:05: The future evolution of MCP and the importance of experimentation.
#MCP #ModelContextProtocol #AI #LargeLanguageModels #LLM #Anthropic #Claude #ClaudeDesktop #ClaudeOS #Google #Agent2Agent #A2A #GeminiOS #ServerClient #AIAgents #Developer #TechPodcast #TwoVoiceDevs #APIs #SoftwareIntegration #FutureofAI

Episode 233 - Generative UI & Fine-Tuning: Turning Magic into Tech
Following up on last week's captivating discussion, Allen Firstenberg and Noble Ackerson dive deeper into the world of Generative UI. Explore real-world examples of its potential pitfalls and discover how Noble is tackling these challenges through innovative approaches.
This episode unveils the power of dynamically adapting user interfaces based on preferences and intent, ultimately aiming for outcome-focused experiences that seamlessly guide users to their goals. Inspired by the insightful quotes from Arthur C. Clarke ("Any sufficiently advanced technology is indistinguishable from magic") and Larry Niven ("Any sufficiently advanced magic is indistinguishable from technology"), we explore how fine-tuning Large Language Models (LLMs) can bridge this gap.
Noble shares a practical demonstration of a smart home dashboard leveraging Generative UI and then delves into the crucial technique of fine-tuning LLMs. Learn why fine-tuning isn't about teaching new knowledge but rather new patterns and vocabulary to better understand domain-specific needs, like rendering accessible and effective visualizations. We demystify the process, discuss essential hyperparameters like learning rate and training epochs, and explore the practicalities of deploying fine-tuned models using tools like Google Cloud Run.
Join us for an insightful conversation that blends cutting-edge AI with practical software engineering principles, revealing how seemingly magical user experiences are built with careful technical considerations.
Timestamps:
0:00:00 Introduction and Recap of Generative UI
0:03:20 Demonstrating Generative UI Pitfalls with a Smart Home Dashboard
0:05:15 Dynamic Adaptation and User Intent
0:11:30 Accessibility and Customization in Generative UI
0:13:30 Encountering Limitations and the Need for Fine-Tuning
0:17:50 Introducing Fine-Tuning for LLMs: Adapting Pre-trained Models
0:19:30 Fine-Tuning for New Patterns and Domain-Specific Understanding
0:20:50 The Role of Training Data in Supervised Fine-Tuning
0:23:30 Generalization of Patterns by LLMs
0:24:20 Exploring Key Fine-Tuning Hyperparameters: Learning Rate and Training Epochs
0:30:30 Demystifying Supervised Fine-Tuning and its Benefits
0:33:30 Saving and Hosting Fine-Tuned Models: Hugging Face and Google Cloud Run
0:36:50 Integrating Fine-Tuned Models into Applications
0:38:50 The Model is Not the Product: Focus on User Value
0:39:40 Closing Remarks and Teasing Future Discussions on Monitoring
Hashtags:
#GenerativeUI #AI #LLM #LargeLanguageModels #FineTuning #MachineLearning #UserInterface #UX #Developers #Programming #SoftwareEngineering #CloudComputing #GoogleCloudRun #GoogleGemini #GoogleGemma #HuggingFace #AIforDevelopers #TechPodcast #TwoVoiceDevs #ArtificialIntelligence #TechMagic

Episode 232 - Generative UI: The Future of Dynamic User Interfaces?
Allen and Noble dive deep into the fascinating world of Generative UI, a concept that goes beyond simply using AI to design interfaces and explores the possibility of UIs dynamically generated in real-time by AI LLMs, tailored to individual user needs and context. Noble, a returning Google Developers Expert in AI, clarifies the crucial distinction between generative UI and AI-aided UI generation. They discuss potential applications like dynamic menus and personalized settings, while also tackling the challenges around predictability, usability, and the role of established design patterns. Discover how agents, constrained within defined boundaries, can power this technology and the current limitations when it comes to generating complex UI components. Join the conversation as they explore the cutting edge of how AI could revolutionize the way we interact with software.
Timestamps:
00:00:00 - Introduction and Noble's return as a Google Developers Expert in AI
00:02:00 - Defining Generative UI and distinguishing it from AI-aided design
00:03:30 - Exploring potential examples of Generative UI based on user needs and context
00:04:45 - The difference between traditional static UIs and dynamic generative UIs
00:06:45 - How LLMs can be leveraged for real-time UI generation
00:07:15 - The overlap and distinction between Generative UI and Conversational
UI
00:08:30 - Challenges of Generative UI: Predictability and guiding users
00:09:30 - The importance of maintaining established UX patterns in Generative UI
00:12:30 - Traditional UI limitations and the promise of personalized generative UIs
00:14:00 - Context-specific information access and adapting to user roles
00:15:30 - An example of Generative UI in a business intelligence dashboard
00:17:00 - A six-stage pipeline for how Generative UI systems might work
00:19:00 - The concept of "agents on rails" in the context of UI generation
00:20:30 - The reasoning and tool-calling aspects of generative UI agents
00:22:30 - Tools as the core of UI generation and component recognition challenges
00:24:30 - Demonstrating the dynamic generation of UI components (charts)
00:27:30 - Exploring interactions and limitations of the generative UI demo
00:29:15 - The "hallucination" of UI components and the need for fine-tuning
00:31:30 - Conclusion and future discussion on component fine-tuning
#GenerativeUI #AI #LLM #UserInterface #UX #AIDesign #DynamicUI #TwoVoiceDevs #GoogleDevelopersExperts #TechPodcast #SoftwareDevelopment #WebDevelopment #AIAgents

Episode 231 - DeepSeek AI: Beating the Odds with Older Tech
DeepSeek AI is turning heads, achieving incredible results with older hardware and clever techniques! Join Allen and Roya as they unravel the secrets behind DeepSeek's success, from their unique attention mechanisms to their cost-effective AI training strategies. But is all as it seems? They also tackle the controversies surrounding DeepSeek, including accusations of data plagiarism and concerns about censorship. This episode is a must-listen for anyone interested in the future of AI!
Timestamps:
0:00 Why DeepSeek is creating buzz
1:06 Unveiling DeepSeek's Two Key Models
2:59 Understanding the Power of Attention
4:12 What is the latent space?
5:55 The nail salon example: Multi-Head Attention Explained
10:02 The doctor/cook/police analogy: Mixture of Experts Explained
13:51 AI vs. AI: DeepSeek's Cost-Saving Training Method
16:01 Hallucinations: Is AI Training Too Risky?
20:59 What are Reasoning Models and Why Do They Matter?
26:53 LLMs are pattern systems explained
28:22 How DeepSeek is using old GPUs
32:53 OpenAI vs. DeepSeek: The Data Plagiarism Debate
39:32 Political Correctness: The Challenge of Guardrails in AI
42:16 Why Open Source is Crucial for the Future of AI
43:20 Run DeepSeek locally on OLAMA
43:56 Final Thoughts
Hashtags: #DeepSeek #AI #LLM #Innovation #TechNews #Podcast #ArtificialIntelligence #MachineLearning #Ethics #OpenAI #DataPrivacy #Censorship #TwoVoiceDevs #DeepLearning #ReasoningModel #AIRevolution #ChinaTech

Episode 230 - Is AI Making Alexa Development Fun Again?
Amazon has announced Alexa Plus, powered by large language models (LLMs), and developers are buzzing with anticipation (and a healthy dose of skepticism!). Join Mark Tucker and Allen Firstenberg, your Two Voice Devs, as they dissect the news, explore the potential of the AI-native SDKs, and debate whether this overhaul will reignite the spark for Alexa development.
In this deep dive, we cover:
* The basics of Alexa Plus: What it is, who gets it for free, and how it differs from classic Alexa skills.
* The fate of classic Alexa skills: Are they migrating, evolving, or being left behind? We explore how current skills might benefit from AI enhancements.
* Alexa's New AI SDKs (Alexa+):
** Action SDK: Turn your existing APIs into voice experiences. Is it all about selling stuff?
** WebAction SDK: Integrate your website with Alexa using low-code workflows. But how does it really work?
** Multi-Agent SDK: Surface your existing bots and agents through Alexa. What's the difference between these and existing Alexa skills?
* The Big Questions: Personalization, monetization, notifications, handling hallucinations, response times, identity, and more!
* And finally, our predictions! Will Alexa Plus make developing for Alexa fun again? Mark and Allen give their takes!
Whether you're a seasoned Alexa developer or just curious about the future of voice interfaces, this episode is packed with insights, questions, and a healthy dose of developer humor. Subscribe to Two Voice Devs for more cutting-edge discussions on voice technology!
More Info:
* https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2025/02/new-alexa-announce-blog
Timestamps:
0:00:00 Introduction
0:01:00 Alexa Plus Overview
0:02:00 Pricing & Classic Skills
0:05:00 Access & Availability
0:06:00 Alexa AI SDKs
0:12:00 Action SDK
0:21:00 WebAction SDK
0:27:00 Multi-Agent SDK
0:31:00 Big Questions for Developers
0:36:00 Will Alexa Be Fun Again?
0:41:00 Response Times & Notifications
0:45:00 Multimodal Experiences
0:46:00 Conclusion
#Alexa #AlexaPlus #VoiceDevelopment #AI #LLM #Amazon #Skills #VoiceFirst #Podcast #Developer #Tech #ArtificialIntelligence #TTS #ASR # ConversationalAI

Episode 229 - Imagen 3: Image Editing Powers for Artists and Developers
Allen Firstenberg and Linda Lawton dive deep into the power of Google's Imagen 3 Editing API. Discover how to effortlessly edit and enhance images, opening up a world of creative possibilities for developers!
* Learn how the In-Painting/In-filling feature can quickly remove wires from an image, add highlights, and correct shading on images that the AI generated, and more.
* Explore how to create your own 3D-printed objects from scratch using AI.
* Discover how you can reference images to put models or products into a specific scene.
* Learn how to use the Out-Painting feature to extend images beyond their original boundaries, transforming portraits into landscapes and beyond.
Also, be prepared for some unexpected and hilarious AI hallucinations along the way as Allen tries to zoom out from an image multiple times! Plus, the duo discusses the ethical implications of AI-generated content and how creatives can leverage these tools to enhance their own artwork.
Don't miss this exciting exploration of Imagen 3 and its potential to revolutionize image manipulation for developers and creators alike!
Timestamps:
00:00:00 Introduction
00:00:55 Imagen 3 Editing API
00:04:36 In-Painting/In-Filling
00:04:52 Generating 3D Models
00:09:00 Vertex AI Studio
00:10:15 Imagen and Gemini Together
00:13:14 Generating Images with Reference Images
00:20:11 Out-Painting
00:31:00 Ethical Implications
#Imagen3 #AI #ImageEditing #GoogleAI #VertexAI #VertexAISprint #MachineLearning #DeveloperTools #GenerativeAI #GenAI #3DPrinting #AIArt

Episode 228 - AI Ethics: How Developers Can Build Fairer Systems
Are you building AI models and systems? Then you need to understand AI ethics! In this episode of Two Voice Devs, Allen Firstenberg welcomes Parul, a Senior Production Engineer at Meta, to dive deep into the world of AI ethics. Learn why fairness and bias are critical considerations for developers, and discover practical techniques to mitigate bias in your AI systems.
Parul shares her experiences and passion for AI ethics, detailing how biases in training data and system design can lead to unfair or even harmful outcomes. This episode provides concrete examples, actionable advice, and valuable resources for developers who want to build more ethical and equitable AI.
More Info:
* Fairlearn: https://fairlearn.org/
* AIF360: https://aif360.readthedocs.io/en/stable/
* what-if tool: https://pair-code.github.io/what-if-tool/
Timestamps:
00:00:00 Introduction
00:00:20 Guest Introduction: Parul, Meta
00:02:22 What is AI Ethics?
00:06:13 Why is AI Ethics Important?
00:08:15 AI Systems vs. AI Models
00:09:52 Examples of Bias in AI Systems
00:12:23 Minimizing Biases: Developer Responsibility
00:14:53 Tips for Minimizing Unfairness and Biases
00:19:40 Fairness Constraints: Demographic Parity
00:23:17 The Bigger Picture: Roles & Responsibilities
00:29:23 Monitoring: Bias Benchmarks
00:32:00 Open Source Frameworks for AI Ethics
00:34:02 Call to Action & Closing
#AIethics #Fairness #Bias #MachineLearning #ArtificialIntelligence #Developers #OpenSource #EthicalAI #TwoVoiceDevs #TechPodcast #DataScience #AIdevelopment

Episode 227 - LLM Evaluation: Choosing the RIGHT Model
Are you overwhelmed by the sheer number of Large Language Models (LLMs) available? Choosing the right LLM for your project isn't about picking the most popular one – it's about understanding your specific needs and rigorously evaluating your options.
In this episode of Two Voice Devs, Allen Firstenberg and guest host Brad Nemer, a seasoned product manager, dive deep into the world of LLM evaluation. They go beyond the marketing buzz and explore practical tools and strategies for making informed decisions.
Whether you're a developer, a product manager, or just curious about the practical applications of LLMs, this episode provides invaluable insights into making the right choices for your projects. Don't get caught up in the hype – learn how to evaluate LLMs effectively!
More Info:
https://www.udacity.com/blog/2025/01/how-to-choose-the-right-ai-model-for-your-product.html
[00:00:00] Introduction: Meet Brad Niemer
[00:00:38] Brad's Journey to Product Management & AI
[00:03:12] Collaboration with Noble Ackerson and the LLM Evaluation Challenge
[00:05:23] The Role of a Product Manager.
[00:07:43] Product manager relation to engineering.
[00:13:46] Exploring Evaluation Tools: Hugging Face
[00:16:58] Exploring Evaluation Tools: Chatbot Arena (Human Evaluation)
[00:20:30] Chatbot Arena: Code Generation Evaluation
[00:24:43] Evaluating LLMs: Beyond Chatbots and Truth
[00:26:11] Exploring Evaluation Tools: Artificial Analysis (Quality, Speed, Price)
[00:28:47] Exploring Evaluation Tools: Galileo (Hallucination Report)
[00:31:16] Case Study: DeepSeek and the Importance of Contextual Evaluation
[00:34:53] The Future of LLM Testing and Quality Assurance
[00:37:49] Wrap Up contact information.
#LLM #LargeLanguageModels #AIEvaluation #ProductManagement #TechTalk #TwoVoiceDevs #HuggingFace #GenAI #GenerativeAI #ChatbotArena #ArtificialAnalysis #Galileo #DeepSeek #ChatGPT #Gemini #Mistral #Claude #ModelSelection #AIdevelopment #SoftwareDevelopment #Testing #QA #RAG #MachineLearning #NLP #Coding #TechPodcast #YouTubeTech #Developers

Episode 226 - Examining Google's Perspective on Agents
Google's white paper on AI Agents has sparked debate – are they truly the next leap in AI, or just large language models dressed up with new terminology? Join Allen and Mark of Two Voice Devs as they dive into the details, exploring the potential of Google's framework while also critically examining its shortcomings. They analyze the core components of agents – models, tools, and orchestration – highlighting the value of defining tools as capable of taking actions. But they also raise key questions about the blurry line between models and agents, the confusing definitions of extensions and functions, and the critical omission of authentication and identity considerations. This episode is a balanced take on a fascinating and complex topic, offering developers valuable insights into the evolution of AI systems.
Key Moments:
[00:00:20] The core definition of agents: A promising start, or too broad?
[00:05:08] Model vs. Orchestration: Understanding the decision-making layers.
[00:17:33] "Tools" Unpacked: Exploring actions, extensions, and functions
[00:25:14] The crucial gap: Authentication, Identity, and User context.
[00:29:36] Reasoning techniques: React, Chain, and Tree of Thought explained.
[00:35:41] The model-agent debate: Where is the boundary line?
[00:37:45] Setting the stage for Gemini 2.0?
[00:39:06] A valuable discussion starter, but with room to grow.
Hashtags:
#AIAgents #GoogleAI #LLM #GenerativeAI #AIInnovation #TechAnalysis #TwoVoiceDevs #AIDevelopment #AIArchitecture #SoftwareEngineering #DeveloperPodcast #GeminiAI #MachineLearning #DeepLearning #AITools #Authentication #TechDiscussion #BalancedTech

Episode 225 - AI, Personalization, and the Future of UX
Join Allen Firstenberg as he welcomes Lee Mallon, a first-time guest host, for an in-depth discussion about the future of development, user experiences, and the exciting potential of AI-driven personalization! Lee shares his journey from coding on a Toshiba MX 128k to becoming CTO of Yalpop, a company reinventing learning through personalized experiences. This isn't just another AI hype-cast – it's a deep dive into how we can shift our mindset to truly put users at the center of our development process, leveraging new tech to create delightful and efficient experiences.
Lee and Allen discuss everything from the limitations of current recommendation engines to the emerging potential of AI agents and just-in-time interfaces. This is a must-watch for any developer looking to stay ahead of the curve and build truly impactful applications.
#AI #ArtificialIntelligence #GenAI #GenerativeAI #Personalization #UserExperience #UX #Development #WebDev #FutureOfTech #LLMs #LargeLanguageModels #AIagents #MachineLearning #SoftwareDevelopment #Programming #WebDevelopment #TwoVoiceDevs #Podcast #TechPodcast #Innovation #Code #Coding #Developer #TechTrends #UserCentricDesign #Web4 #NoCode #LowCode #DigitalTransformation

Episode 224 - AI is Coming for Your Code! (Is That a Bad Thing?)
Hold onto your keyboards, folks! AI is shaking up the software engineering world, and in this electrifying episode of Two Voice Devs, Allen and Mark are diving headfirst into the chaos. We're not just talking about the theory – we're getting real about how AI coding tools are actually impacting developers right now. Is this the end of coding as we know it, or the dawn of a new era of software creation?
More Info:
* https://newsletter.pragmaticengineer.com/p/how-ai-will-change-software-engineering
* https://addyo.substack.com/p/the-70-problem-hard-truths-about
[00:00:00] Introduction: Meet Allen and Mark and hear about their busy start to the year.
[00:00:39] The Trigger: Discover the article from The Pragmatic Engineer that sparked this conversation about the role of AI in software engineering.
[00:02:16] Addressing the Panic: We discuss the common fear: is AI going to steal developer jobs?
[00:03:34] Key Article Points: Allen breaks down the seven key areas of the article: how developers are using AI, the "70% Problem," and more.
[00:04:43] Design Patterns & Craftsmanship: Mark discusses how AI-driven development relates to established software patterns and developer craftsmanship.
[00:07:44] The Knowledge Paradox: Unpack the key difference in how senior and junior developers use AI and the potential issues it raises.
[00:10:06] AI vs. Stack Overflow: We explore the differences between getting code from AI and from community platforms like Stack Overflow.
[00:12:49] Personal Experiences: Allen and Mark share how they're actually using AI tools in their coding workflows.
[00:17:09] AI Usage Patterns: Discussing the "constant conversation", "trust but verify", and "AI first draft" patterns.
[00:20:55] The 70% Problem Revisited: Is AI just getting us part way there?
[00:23:24] AI as a Team Member: Exploring the idea of AI as a pair programming partner and whether it's actually helping.
[00:24:41] Trusting your Experience: the importance of listening to the gut feeling of an experienced developer when AI-generated code "feels" wrong.
[00:26:06] Programming Languages are Easy for AI: The simplicity and consistency of programing grammars.
[00:27:47] Is English the New Programming Language?: We debate the idea that natural language is becoming as important as coding and discuss what "programming" really means.
[00:30:36] The Problem with Trying to Make Programming Easy: Historical attempts to make programming easier are revisited.
[00:32:37] Programming vs the Rest of the Job: The core job of a software developer is more than just programming and writing code.
[00:37:21] Quality & Craftsmanship in the Age of AI: We explore what will make software stand out in the future and how crafting great software still matters.
[00:40:27] AI for Personal Software: Could AI drive a renaissance in personal software, similar to the spreadsheet?
[00:42:53] The Importance of AI Literacy: Mastering AI development is the new skill to make developers even more valuable.
[00:43:47] Closing Thoughts: The essential skills of developers remain crucial as we move into the future of AI driven coding.
[00:44:59] Call to Action: We encourage you to join the conversation and share your thoughts on AI and software development.
This isn't just another tech discussion – it's a high-stakes debate about the so
ul of software engineering. Will AI become our greatest ally, or our ultimate replacement? Tune in to find out!
#AIApocalypse #CodeRevolution #SoftwareEngineering #ArtificialIntelligence #Coding #Programming #Developers #TechPodcast #TwoVoiceDevs #MachineLearning #AICoding #FutureofCode #TechDebate #DeveloperSkills #CodeCraft #AIvsHuman #CodeNewbie #SeniorDev #JuniorDev #TechTrends

Episode 223 - Grounding Gemini with Google Search and LangChainJS
Join Mark and Allen, your favorite Two Voice Devs, as they explore the exciting (and sometimes frustrating!) world of Gemini 2.0's search grounding capabilities and how to use it with LangChainJS! Allen shares his recent holiday project: a deep dive into Google's latest AI tools, including the revamped search grounding feature, and how he made it work seamlessly across Gemini 1.5 and 2.0. We'll show you the code and demonstrate the differences between using search grounding and not, using real-world examples. Learn how to build your own powerful, grounded AI applications and stay ahead of the curve in the rapidly changing AI landscape!
In this episode, you'll discover:
[00:00:00] Introduction to Two Voice Devs and what we've been up to
[00:00:24] Allen discusses tackling bug fixes and updates with Gemini 2.0 and LangChain
[00:00:51] The new Gemini 2.0 Search Grounding Tool: what's new? What does it mean to be "agentic"?
[00:02:13] Allen dives into the Google Search Tool, understanding the differences between 1.5 and 2.0, and building a layer for easy use in LangChain
[00:03:06] Allen walks us through the code! The magic of setting up a model with or without search capabilities in LangChainJS
[00:04:48] Using output parsers and annotating your results in LangChainJS
[00:05:53] Similarities between Perplexity's results, and how LangChainJS handles output
[00:06:46] Running the same query with and without grounding, and the dramatic difference in the response (Who won the Nobel Prize for Physics in 2024?)
[00:08:26] A closer look at how LangChainJS presents its source references and how to use them in your projects.
[00:12:55] Taking advantage of tools that Google is providing
[00:13:20] The goal of keeping backward compatibility for developers
[00:15:39] Exploring how this is a version of RAG and how that compares to using external data sources
[00:16:50] What are data sources in VertexAI and how they relate?
[00:19:14] What is the cost? How is Google pricing the search capability?
[00:20:59] More to come soon from Allen with LangChainJS!
Don't miss this deep dive into cutting-edge AI development! Like, subscribe, and share if you find this information helpful!
#Gemini #LangChain #LangChainJS #AI #ArtificialIntelligence #GoogleAI #VertexAI #SearchGrounding #RAG #RetrievalAugmentedGeneration #LLM #LargeLanguageModels #OpenSource #TwoVoiceDevs #Programming #Coding #GoogleSearch #DataScience #MachineLearning #Innovation #TechPodcast #TechVideo

Episode 222 - 2024 Recap / 2025 Predictions: AI, Agents, Voice Assistants, and More
Happy New Year from Two Voice Devs! Join Allen Firstenberg and Mark Tucker as they recap the whirlwind that was 2024 in AI and tech, and make some bold predictions for 2025. We're diving deep into the biggest players, the most exciting innovations, and the developer challenges that lie ahead. From OpenAI's O3 and Google's Gemini 2.0 to the rise of Anthropic and the resurgence of head-mounted wearables, we preview the stories we'll be talking about this year. Plus, we discuss the important questions around the cost, both financial and environmental, of this new AI landscape.
This episode is packed with insights for any developer looking to navigate the rapidly evolving world of artificial intelligence. Don't miss our discussion on what "agents" actually mean, and what the future holds for voice assistants.
Timestamps:
[00:00:00] Intro & Happy New Year! - Welcome to Two Voice Devs and a fun fact to kick things off!
[00:01:14] Looking Back at 2024: A recap of the biggest AI movers of the past year.
[00:01:32] OpenAI & Google Dominate: Analyzing the impact of OpenAI and Google's announcements, including O3, Gemini 2.0 Thinking, Sora, Veo, and Imagen.
[00:04:02] OpenAI's Internal Turmoil and Google's Notebook LM: A look into the organizational chaos at OpenAI and the impressive upgrade of Notebook LM using Gemini 2.0.
[00:05:24] Apple Intelligence, Amazon & the Catch-Up Game: Discussions around the progress of Apple and the challenges Amazon is facing, along with Anthropic.
[00:08:15] Meta's LLAMA Models and Ray Bans: Exploring the surprising impact of
Meta's AI models and the resurgence of head-mounted wearables.
[00:10:04] Developer Realities: Fine-Tuned Models & DevOps: Mark discusses the importance of smaller, fine-tuned models and DevOps practices for language models.
[00:11:54] The Environmental & Ethical Concerns of AI: A critical discussion about the environmental impact, ethical concerns, and privacy considerations of large language models.
[00:13:15] Allen's 2024 Contributions: Langchain.js and GDE Presentations: Allen shares his work with open-source projects, LangChain, and traveling as a GDE
[00:16:56] 2025: Predictions and "Agents": The focus is on the emergence of "agents" and the uncertainty surrounding their definition.
[00:19:18] Defining "Agents": Allen lays out his predictions of what makes an agent.
[00:20:58] The Resurgence of Voice Assistants: Discussing the future of voice assistants and the potential revival with the emergence of new technologies.
[00:23:59] Google's Project Astra and Android XR: Exploring the new integrations in the voice and AI spaces.
[00:24:46] Home Assistant: An Open Source Alternative: A deep dive into this lesser discussed project and it's voice hardware offering.
[00:27:09] Amazon's Catch-Up: Is Amazon ready to get back into the AI and voice assistant game?
[00:28:01] Looking forward into the future of LLMs: Predictions on LLMs and where they're going.
[00:29:20] Outro: Thank you for joining Two Voice Devs.
#AI #GenAI #ArtificialIntelligence #MachineLearning #LargeLanguageModels #LLM #OpenAI #Google #Gemini #Sora #Anthropic #Meta #LLAMA #AppleIntelligence #Amazon #Alexa #VoiceAssistant #Wearables #MetaRayBans #AndroidXR #GoogleGlass #Agents #Developer #Tech #Innovation #LangChain #GDE #TwoVoiceDevs #Podcast #YouTube #2024Recap #2025Predictions #TechTrends #Programming #Coding #OpenSource

Episode 221 - AI Holiday Gift Guide: Amazon, Meta, Google, OpenAI, and MORE!
It's the holiday season, and the AI world has been showering us with gifts! Join Mark and Allen on Two Voice Devs as they unwrap a mountain of new announcements and releases from Amazon, Meta, Google, and OpenAI. From groundbreaking new models to developer-friendly tools, this episode is packed with insights on the latest advancements in AI. We'll explore the features and potential of each new "present" and discuss what it means for you, the developer.
[00:00:00] Intro and Holiday Greetings: Mark and Allen kick off the show, reflecting on the recent flurry of AI releases.
[00:00:15] The AI Gift Giving Season: A lighthearted introduction on the sheer volume of new AI tools being released.
[00:01:41] Amazon Nova Models: Amazon's surprising release of multiple new models, including Micro, Lite, and Pro, with a peek at Canvas (image generation) and Reel (video generation).
[00:04:42] Meta's Llama 3.3: The focus on multilingual capabilities and open-source nature of Llama 3.3.
[00:05:38] OpenAI vs. Google Announcement Showdown: The back-and-forth between Google and OpenAI with a focus on developer-related announcements.
[00:06:40] Google's Imagen 3 & Veo: Google's new advancements in image and video generation available on Vertex AI, including image editing via prompting.
[00:07:28] OpenAI's Sora Release: OpenAI makes their impressive video generation model available, but notably, not yet via API.
[00:08:34] OpenAI's Canvas for Code: Explore how you can interact with code as a chatbot on a virtual canvas.
[00:09:21] Microsoft's Expanded Copilot Free Tier: A note about Microsoft expanding access to their code tool.
[00:09:38] Google's Jules: The AI Bug Detective: An introduction to Google's automated bug-fixing system which proposes fixes in a version control branch.
[00:11:09] OpenAI's O1 Model: The official release of the O1 model with function calling, structured output, and image input capabilities.
[00:11:42] Gemini 2.0 API: Google's improved Gemini API, now in public preview, offering better performance with optimized tools.
[00:14:01] OpenAI's Real-Time API & WebRTC: Details about real time APIs, including WebRTC support for simplified browser-to-server connections.
[00:16:15] Google's Gemini 2.0 Live API: Real-time streaming API using WebSockets for multimodal input and output, with demos available on AI Studio.
[00:17:01] Google's New SDKs: A deep dive into the unified libraries for AI Studio and Vertex AI, simplifying things for developers.
[00:18:10] OpenAI's new Java and Go Libraries: OpenAI ups their game by adding libraries to match Google's supported development platforms.
[00:19:49] Google's PaliGemma 2 and Android XR: Vision-enabled open model, and a new Android platform for headsets and smart glasses.
[00:22:04] Wrapping Up: Mark and Allen discuss which tools they're most excited about for the break and what's in store for the future.
Let us know in the comments what you're most excited about, or if you noticed anything we missed. We’ll discuss it on future episodes.
#AI #ArtificialIntelligence #MachineLearning #GenerativeAI #LLM #LargeLanguageModels #AmazonNova #Llama3 #Gemini2 #OpenAI #GoogleAI #VertexAI #AIStudio #ChatGPT #GPT #O1 #Reasoning #ImageGeneration #VideoGeneration #DeveloperTools #Coding #Programming #WebRTC #AndroidXR #TechNews #TwoVoiceDevs

Episode 220 - How to Actually Explain Complex Tech Without Being Boring
Ever felt like your tech presentations, tutorials, or even code explanations are falling flat? You're not alone! In this episode of Two Voice Devs, Allen and Mark dive deep into the art of effective communication in tech, exploring how to move beyond just listing facts to building a compelling narrative that actually helps people understand.
Inspired by a recent presentation that Allen felt was "just okay," they tackle the challenge of how to present information in a way that resonates, whether you're on stage, creating content, or mentoring new developers.
[00:00:00] Introduction to Two Voice Devs
[00:00:16] End-of-year craziness and the inspiration for the episode
[00:00:40] Allen's experience with a presentation that felt flat despite positive feedback.
[00:01:31] The realization of a missing narrative in the presentation.
[00:02:27] Discussion of building narrative into different types of content.
[00:02:33] Deep dive into the structure and content of Allen's Gemini presentation.
[00:04:04] The real message Allen was hoping to convey, and where the presentation fell short.
[00:05:34] The importance of the "why" behind the "what" when presenting new features and concepts.
[00:05:50] Exploring the concept of "telling a story" to make technical concepts easier to understand.
[00:06:29] How individual learning experiences influence the way that you present material.
[00:07:51] Balancing the desire to include all the information, while also keeping a succinct message.
[00:08:50] Pivoting to talking about other ways of imparting information.
[00:09:07] Mark's method of learning and creating diagrams, which then turn into a video.
[00:11:08] The challenge of jumping into code without sufficient background.
[00:12:10] Presenting information in the order that makes sense to you and why.
[00:12:59] Learning by creating and being willing to share even when you are still learning.
[00:13:39] Why committing to a presentation helps you learn a subject.
[00:14:44] Using social media to get information out there quickly, and also, sample projects.
[00:15:27] How starting with small chunks of code can help with understanding
[00:16:31] Using AI tools to explain code.
[00:17:13] How developers need to understand why code works, and not just that it works.
[00:18:58] Why it's important to make learning a conversation and asking questions.
[00:19:29] Mentoring and understanding where students are starting from.
[00:20:54] How in-person feedback is both a benefit and a challenge.
[00:22:12] Creating a safe space for collaborating and learning together.
[00:23:38] Working together to get a level of understanding.
[00:24:13] Call to action for audience to share their techniques.
#TechContent #TechTutorials #DeveloperPresentations #Mentoring #SoftwareDevelopment #CodeTutorial #DevTips #TechNarrative #CommunicationSkills #TwoVoiceDevs #Coding #SoftwareEngineering #Teaching #Learning #AI #Gemini #Storytelling #MadeToStick #TechnicalCommunication #DevFest #Programming

Episode 219 - The Ethics of Data Scraping and LLMs
Join Mark and Allen on Two Voice Devs this week as they delve into a critical discussion about data scraping, large language models (LLMs), and the ethical responsibilities of developers. From the recent controversy surrounding BlueSky data scraping and Hugging Face datasets to the complexities of copyright law and personal privacy in the age of AI, this episode explores the gray areas and tough questions facing developers today. Hear their perspectives on the potential misuse of publicly available data, the challenges of anonymization, and the importance of upholding ethical standards in a rapidly evolving technological landscape. They also share personal anecdotes about navigating privacy policies and the dilemmas of data collection for business versus personal use. Tune in to gain valuable insights and contribute to the conversation about responsible development practices.
[00:00:00] Introduction
[00:01:04] Mark's deep dive into BlueSky's architecture and the data scraping controversy.
[00:02:27] Discussion on BlueSky's data policy and user ownership.
[00:05:32] Copyright implications of using scraped data in LLMs.
[00:06:22] Exploring ethical data sources for LLM training (Wikipedia, Reddit, etc.).
[00:08:31] Real-world examples of potential copyright infringement in image and video generation.
[00:09:34] Hugging Face's guidelines and the removal of the BlueSky dataset.
[00:12:19] The curious case of the "David Meyer" bug in ChatGPT and its implications for data privacy.
[00:14:24] Allen's personal dilemma with Vodo Drive's privacy policy and data collection for model training.
[00:16:50] Balancing business needs with ethical data practices.
[00:17:00] Allen's challenge gathering Gemini release notes and his ethical solution.
[00:19:20] The ethical responsibilities of software engineers, drawing parallels to the Challenger disaster.
[00:21:19] The developer's role in advocating for ethical data usage.
[00:22:21] Call to action: Share your thoughts and perspectives!
#DataScraping #LLMs #AIethics #DeveloperEthics #Privacy #Copyright #BlueSky #HuggingFace #SoftwareEngineering #DataPrivacy #AI #TwoVoiceDevs #Podcast #TechPodcast #WebSockets #DataScience #EthicalAI #ResponsibleAI #TechEthics #Gemini #GoogleAI

Episode 218 - Jovo's Sunset: A Celebration and Look Ahead
The Jovo open-source framework, a beloved tool for building cross-platform voice applications, is being archived. Join Mark and Allen as they discuss Jovo's history, its impact on the voice development landscape, and what its sunset means for developers. While the news might be bittersweet, we take this opportunity to celebrate Jovo's contributions and explore the valuable lessons learned from its innovative approach to voice app development. We delve into the framework's key features, including its plugin and pipeline architecture, and discuss how these concepts can still inspire future voice projects. Plus, Mark shares his personal experiences using Jovo and hints at exciting potential future directions for forked versions of the framework. Whether you're a seasoned Jovo user, a curious voice developer, or interested in open-source contributions, this episode offers insights and inspiration for your next voice project.
More info:
- https://github.com/jovotech/jovo-framework
- https://www.youtube.com/watch?v=5rce0KGFyz8
[00:00:00] Introduction and Disappointing News
[00:01:41] What was Jovo?
[00:06:18] Early Jovo Encounters
[00:07:38] The Vision of Jovo
[00:09:24] Jovo's 4 P's: Purpose, Platforms, Pipelines, and Plugins
[00:14:47] Abstraction Layers and Modern Analogies (LangChain, GenKit)
[00:17:20] The Official Announcement: Jovo's Archiving
[00:18:44] What Archiving Means for Developers
[00:22:25] Reflections on Jovo's Impact and Future Directions
[00:25:44] The Importance of Contributing to Open Source
[00:26:19] Lessons Learned from Jovo and Open Source Contributions
#Jovo #VoiceDevelopment #OpenSource #VoiceApps #AlexaSkills #GoogleAssistant #Chatbots #Frameworks #SoftwareDevelopment #TypeScript #JavaScript #Innovation #Community #Collaboration #NextJS #React #LangChain #GenKit #VoiceFlow #Podcast #TwoVoiceDevs #Webhooks #APIs #NLU

Episode 217 - A Thanksgiving Tradition: Gratitude, Community, and the Ever-Changing World of Development
Join Allen Firstenberg and Mark Tucker in this heartwarming Thanksgiving episode of Two Voice Devs! In a year of rapid advancements and shifts in the tech landscape, they take a moment to express sincere gratitude for the people, opportunities, and community that make their developer journey so rewarding. They reflect on the challenges and triumphs of the past year, the importance of support networks, and the exciting future of AI and development. Whether you're a seasoned coder or just starting out, this episode is a reminder of the shared passion and connection that fuels our ever-evolving world of technology.
Timestamps:
0:00:00: Introduction and Thanksgiving greetings
0:02:00: Reflection on the Two Voice Devs journey and community support
0:03:42: Gratitude for family and personal connections
0:05:14: Appreciation for supportive work environments and colleagues
0:06:26: Acknowledging the wider developer community and industry advancements
0:07:19: Gratitude for fulfilling careers and the constant learning process
0:10:53: A look forward to future discussions on emerging technologies
0:11:05: Expressing appreciation for each other's partnership on the podcast
0:12:04: Thanks to guest hosts and the importance of diverse voices
0:13:07: An invitation to connect and share your thoughts
#podcast #developers #softwaredevelopment #GenAI #AI #artificialintelligence #community #gratitude #thanksgiving #tech #technology #careers #learning #innovation #TwoVoiceDevs #coding #programming

Episode 216 - DevAI: Threat or Enabler? Live Q&A from Voice & AI 2024
Join Allen Firstenberg and Noble Ackerson, hosts of the Two Voice Devs podcast, for a lively and insightful Q&A session recorded live at Voice & AI 2024! We dive into the burning questions surrounding AI's impact on software development, exploring the potential threats and exciting opportunities presented by tools like GitHub Copilot and Cursor. From the future of junior developers to the ethics of non-deterministic systems, we tackle it all with our signature blend of technical expertise and opinionated discussion. Don't miss this engaging conversation with a passionate audience, packed with thought-provoking insights and a few laughs along the way.
0:00:00 - Introduction to Two Voice Devs and our live show format
0:01:26 - Noble Ackerson introduces himself and his AI journey.
0:02:00 - Allen and Noble's first meeting at a Google Glass hackathon.
0:06:47 - Q&A begins! Is DevAI a threat to developers?
0:07:08 - Defining DevAI and its role in software development.
0:07:54 - Noble's perspective: AI as an enabler, not a threat.
0:11:55 - The importance of context and prompting in AI systems.
0:14:48 - Addressing the challenge of misalignment between AI goals and user needs.
0:17:02 - Using experience and context to enhance AI's reliability.
0:20:05 - The importance of developer experience in evaluating AI-generated code.
0:20:27 - Ethical considerations of non-deterministic AI systems.
0:24:36 - Exploring the ethics of AI models and data usage.
0:27:29 - The role of designers and developers in responsible AI implementation.
0:27:43 - Agentic systems and the potential for explaining AI's reasoning.
0:29:34 - The need for human oversight in AI systems.
0:30:04 - Ethical concerns regarding AI and potential self-interest of companies.
0:30:55 - The future of AI models: LLMs, neuro-symbolic approaches, and hybrids.
0:31:59 - Reinforcement Learning with Human Feedback (RLHF) and instilling values in AI.
0:33:30 - Where to find Allen and Noble online.
#AI #ArtificialIntelligence #SoftwareDevelopment #DevAI #Podcast #LiveQA #VoiceAndAI #Ethics #LLM #MachineLearning #Developers #Coding #Programming #Technology #FutureofWork #Innovation #GitHubCopilot #Cursor #ResponsibleAI #ExplainableAI #NeuroSymbolicAI #RLHF #TwoVoiceDevs

Episode 215 - Unlock Cross-Platform Machine Learning Model Deployment
Tired of wrestling with platform-specific machine learning model formats? Join Allen Firstenberg and Mark Tucker on Two Voice Devs as they explore ONNX (Open Neural Network Exchange), a game-changing open format built to streamline your ML model deployment workflow. Discover how ONNX empowers you to train models in your preferred framework (PyTorch, TensorFlow, scikit-learn, etc.) and seamlessly execute them across diverse platforms (Windows, Mac, Linux, iOS, Android, Web) using the efficient ONNX Runtime.
In this episode, we delve into:
[00:00:00] Introduction: A warm welcome and a quick overview of the show's agenda.
[00:01:18] What is ONNX?: Unraveling the mysteries of ONNX and its purpose in the ML ecosystem.
[00:02:38] Model Preparation: Understanding how to prepare models for ONNX conversion and the concept of inference.
[00:04:05] Hugging Face Example: A practical demonstration of a BERT model in ONNX format on Hugging Face.
[00:06:00] The Developer's Perspective: Why ONNX matters for developers building applications that leverage ML models.
[00:07:24] ONNX Optimization: How ONNX optimizes models for inference and the trade-offs involved.
[00:08:56] The Cross-Platform Advantage: Breaking free from framework limitations and enabling deployment flexibility.
[00:11:19] ONNX Runtime Introduction: Exploring the ONNX Runtime and its support for various languages and platforms.
[00:14:04] ONNX Runtime Deep Dive: A closer look at the ONNX Runtime website and its features.
[00:15:45] ONNX for Mobile and Web: Extending ONNX's reach to mobile devices and web browsers.
[00:16:56] Conversion Process: Learn how to convert models from different formats to ONNX.
[00:18:08] Performance Considerations: Addressing concerns about performance and speed in ONNX.
[00:19:58] Code Examples: Practical code snippets demonstrating ONNX Runtime usage in JavaScript, Python, and C#.
[00:23:23] ONNX and MLOps: Integrating ONNX into your MLOps pipeline for seamless deployment.
[00:23:42] Netron Tool Introduction: Visualizing ONNX models using the Netron tool.
Whether you're a seasoned data scientist or a developer just beginning your ML journey, this episode provides valuable insights into leveraging ONNX for efficient and cross-platform model deployment. Share your experiences and questions in the comments below!
Thumbnail by Imagen 3 with prompt:
Cartoon ink and paint, with a touch of tech.
Scene: Two podcast hosts, sitting in front of microphones,
smiling and engaging in conversation.
Both hosts are male, caucasian, software developers in their early 50s,
wearing glasses, and are clean shaven.
The host on the left is wearing an olive t-shirt and a brown flat cap.
The host on the right is wearing a light blue polo shirt.
Warm, inviting lighting.
Background:
A polished, dark, onyx gemstone, reflecting light and giving it depth.
The gemstone facets should subtly reflect stylized icons of different
operating systems (Windows logo, Apple logo, Android logo, a cloud icon),
hinting at cross-platform compatibility.
Dark, sleek, and mysterious, with the onyx stone as the centerpiece.
The reflected platform icons should be subtle and not overly distracting.
The overall impression should be one of sophisticated power and hidden
potential, alluding to the capabilities of ONNX.
Negative prompt:
beards
#ONNX #MachineLearning #ML #AI #ArtificialIntelligence #DeepLearning #ModelDeployment #CrossPlatform #PyTorch #TensorFlow #ScikitLearn #MLOps #SoftwareDevelopment #WebDevelopment #MobileDevelopment #JavaScript #Python #CSharp #HuggingFace #Netron

Episode 214 - NotebookLM: The Future of Personalized AI Learning for Developers?
Dive into the world of AI-powered learning with Allen and Mark as they explore Google's innovative NotebookLM. This cutting-edge tool offers a fascinating glimpse into the potential of Google's Gemini AI model. NotebookLM allows you to centralize your notes, documents, and even audio/video transcripts, transforming them into an interactive knowledge base. Discover how its conversational interface lets you ask questions, generate summaries with citations, and even create podcasts from your source material!
Allen and Mark discuss how NotebookLM serves as a compelling example for developers looking to build their own Gemini-based applications. They break down how its features, like intelligent summarization, citation generation, and conversational Q&A, can be replicated and customized using the Gemini API. This episode provides valuable insights and inspiration for developers eager to harness the power of Gemini for their own projects.
They also cover practical use cases for students, developers, and anyone looking to personalize their learning experience, while addressing NotebookLM's current limitations. Join the conversation and share your thoughts on how you might use this exciting new technology!
Timestamps:
[00:00:00] Introduction
[00:00:50] What is NotebookLM? Core Functionality and Features
[00:02:47] Asking Questions and Getting AI-Generated Summaries with Citations
[00:04:23] Following Citations and References within the Notebook
[00:06:24] Potential Use Cases for Students (PDF Import and Analysis)
[00:07:17] Creating Outlines and Other AI-Assisted Note Enhancements
[00:07:51] Supported File Types and Data Handling
[00:09:10] Generating Podcasts from Source Material (with audio example!)
[00:11:47] Using NotebookLM with Technical Documentation (Gemini API Example)
[00:14:17] Managing and Selecting Sources within a Notebook
[00:14:53] Downsides and Limitations (No API, Manual Processes)
[00:17:43] Comparison to ChatGPT and GPTs
[00:18:41] Sharing and Collaboration Features
[00:19:26] Potential Applications for Developers (Project Management, Code Analysis)
[00:25:11] Integrating with Automated Testing and CI/CD
[00:25:34] Potential Integration with Git Repositories and Version Control
[00:27:12] The Future of AI-Powered Knowledge Systems
[00:28:04] More Potential Use Cases (Research Paper Analysis)
[00:28:36] Customized Learning and Problem Solving with AI
[00:30:06] Conclusion and Call to Action
#NotebookLM #AI #ArtificialIntelligence #MachineLearning #Developers #Programming #SoftwareDevelopment #Productivity #Learning #Education #Podcast #GoogleAI #Gemini #Chatbots #KnowledgeManagement #NoteTaking #AItools #TechPodcast #TwoVoiceDevs

Episode 213 - Scary Developer Stories: A Halloween Special
Boo! Join Two Voice Devs for a special Halloween episode filled with chilling tales from the software development crypt. Mark and Allen recount true stories of coding nightmares, from dropped databases to runaway pings, and offer words of wisdom for surviving your own development horrors. Listen with the lights on (if you dare) as they explore the spooky side of coding, complete with a chilling Halloween soundtrack. Don't forget to share your own scary developer stories in the comments!
Timestamps:
[0:00:00]: Intro and Halloween greetings
[0:00:14]: Drop Dead Data: Mark's database disaster
[0:00:04]: The Growing DNS ID: Allen's DNS scare
[0:00:06]: Death by a Thousand Pings: A coworker's network nightmare
[0:00:08]: The Big O Monster: Allen's tales of inefficient algorithms
[0:00:15]: Imminent Demise: Mark's story of a company's downfall
[0:00:18]: The AI Ghost in the Machine: Allen's cautionary tale about AI-generated code
[0:00:20]: Campfire reflections and call for listener stories
#Halloween #DeveloperDisasters #AI #AICoding #LLMCoding #AIDeveloperTools

Episode 212 - Data Labeling for Developers
Join Mark and Allen, your Two Voice Devs, as they delve into the crucial world of data labeling for machine learning model training. Whether you're a seasoned data scientist or a developer just starting to explore AI, understanding data labeling is essential for building effective models. In this episode, they explore various data labeling techniques, from manual labeling for simple voice apps to automated approaches using open-source libraries like Snorkel. Discover how labeled data powers everything from chatbots and voice assistants to spam filters and advanced language models like BERT. They discuss practical examples and highlight the role developers play in preparing and refining data for optimal model performance.
Timestamps:
[00:00:00] Introduction
[00:00:18] What is data labeling?
[00:03:41] Jovo example: Manual data labeling for voice apps and chatbots
[00:08:01] Labeling with slots and entities
[00:13:54] BERT example: Automated labeling during model training
[00:18:52] BERT inference and fine-tuning
[00:25:36] Snorkel: Programmatic data labeling with Python
[00:29:43] Snorkel example and labeling functions
[00:31:23] Leveraging LLMs for data labeling and augmentation
[00:33:24] The role of developers in data labeling
[00:35:02] Call to action: Share your data labeling experiences!
Thumbnail by Imagen 3 with prompt:
Cartoon ink and paint, with a touch of tech.
Scene: Two podcast hosts, sitting in front of microphones, smiling and engaging in conversation.
Both hosts are male, caucasian, software developers in their early 50s, wearing glasses, and are clean shaven.
The host on the left is wearing a blue t-shirt and a brown flat cap.
The host on the right is wearing a light blue polo shirt.
Warm, inviting lighting.
Background:
Individual data items, represented by squares or circles. Some are marked with a red A while others are marked with a green B. There are dotted lines grouping some of them together.
Negative prompt:
beards
#DataScience #MachineLearning #ML #AI #DataLabeling #DataTraining #ModelTraining #Jovo #BERT #Snorkel #Developers #SoftwareDevelopment #VoiceFirst

Episode 211 - Apple Intelligence and Siri's Future (and Beyond)
Join us for a fascinating conversation with John G, a seasoned voice developer, as he shares his insights into Apple's approach to AI and the future of Siri. John discusses his journey from helping content creators to the Alexa ecosystem and then into the Apple world, driven by the potential of App Intents and the evolving landscape of Apple Intelligence. We delve into the technical details, exploring how App Intents, the semantic index, and on-device LLMs are shaping the future of app development on Apple platforms. Get ready to unravel the complexities and discover the exciting possibilities of building voice-enabled experiences in the Apple ecosystem.
Learn more:
* https://justbyspeaking.com
* https://justbyspeaking.com/2024/06/14/meet-art-museum.html
* https://machinelearning.apple.com/research/introducing-apple-foundation-models
Timestamps:
[00:00:00] Introduction
[00:01:23] John's Journey into Programming and Apple Development
[00:04:12] Entering the Voice Era with Alexa
[00:10:03] Transitioning Away from Alexa
[00:11:58] Discovering App Intents and the Apple Ecosystem
[00:15:54] Introducing Art Museum: John's First iOS App
[00:18:05] History of Siri and SiriKit
[00:22:25] The Rise of App Intents
[00:27:20] App Intents Implementation Details
[00:30:01] The Pervasiveness of App Intents in iOS
[00:32:40] Unveiling Apple Intelligence
[00:36:10] App Intents and the Semantic Index
[00:40:03] App Intents, Domains, and Parameters
[00:43:42] Adapters and the Foundation Model
[00:46:43] Developer Access to the On-Device Model
[00:49:44] Predefined App Intents (Schemas)
[00:51:34] App Shortcuts and Apple Intelligence
[00:52:56] The Future Potential of Apple Intelligence
[00:59:28] Conclusion
#Siri #GenAI #GenerativeAI #ConversationalAI #VoiceFirst #AppleIntelligence #Bixby #AppIntents #LargeLanguageModel #Apple #iOS #MacOS #AppleDeveloper #SoftwareDevelopment #AppDevelopment
Thumbnail by Imagen3 with prompt:
Cartoon ink and paint, with a touch of tech.
Scene: Two podcast hosts, sitting in front of microphones, smiling and engaging in conversation.
The host on the left is a clean-shaven male caucasian in his early 50s, wearing a light blue polo shirt and glasses.
The host on the right is a young man wearing glasses, an olive-green knit cap,
a short beard and mustache that are light brown, and a dark blue t-shirt.
Warm, inviting lighting.
Background:
A stylized representation of an iPhone displaying a glowing screen border surrounding a symbolic representation of "apple intelligence" the apple logo with a human brain.

Episode 210 - Simplifying Generative AI Development with Firebase GenKit & GitHub Models
Join Mark and Xavier on Two Voice Devs as they dive into the world of generative AI development with Firebase GenKit and GitHub Models. Xavier, a Google Developer Expert in AI, Microsoft MVP in AI, and GitHub Star, shares his insights on these emerging technologies and his open-source project that bridges the gap between them. Discover how Firebase GenKit offers a simpler, more modular approach to building GenAI applications compared to frameworks like LangChain. Learn about GitHub Models and how they provide easy access to various LLMs for experimentation and prototyping. Xavier also discusses the challenges and rewards of open-source contribution and how community feedback fuels innovation. This episode is a must-listen for developers looking to explore the exciting landscape of generative AI.
More Info:
* https://firebase.google.com/docs/genkit
* https://github.com/xavidop/genkitx-github
* https://docs.github.com/en/github-models
Timestamps:
[00:00:00] Introduction
[00:00:50] What is Firebase GenKit?
[00:06:08] What are GitHub Models?
[00:09:24] Moving GitHub Models to Production
[00:10:30] The GenKitX GitHub Plugin: Origin, Approach, and Challenges
[00:14:14] Coding with Firebase GenKit
[00:16:07] The Importance of Open Source
[00:19:46] Using Firebase GenKit with Voiceflow
[00:20:35] Conclusion and Call to Action
#GenerativeAI #GenAI #OpenSource #Firebase #FirebaseGenKit #GitHub #LangChain #TypeScript #Golang

Episode 209 - AI-Powered Pronunciation: Conquering Tricky TTS
This episode of Two Voice Devs, recorded before the exciting announcement of OpenAI's GPT-4o Realtime and Audio previews, tackles a classic developer challenge: taming unruly text-to-speech (TTS) engines. Triggered by a listener question, Allen and Mark dive into the frustrating inconsistencies of TTS pronunciation, particularly when dealing with dynamically generated text from LLMs. They explore the limitations of SSML, experiment with phoneme alphabets like X-SAMPA, and even ponder the possibility of multimodal LLMs generating perfect audio natively – a concept now realized with models like GPT-4o Realtime and Audio! While Mark and Allen don't discuss these new models directly, their insights on pronunciation control, leveraging existing tools, and integrating LLMs with TTS remain incredibly relevant. Join us for a conversation that foreshadows the future of AI-powered voice development and offers practical strategies for achieving flawless pronunciation, even in the pre-realtime audio era. These techniques and discussions offer valuable context and potential solutions even as new, more advanced models emerge.
Timestamps:
[00:00:00] Introduction and Listener Question: The challenge of inconsistent TTS pronunciation.
[00:02:01] The Problem in Action: Hear how Google TTS mispronounces a seemingly straightforward phrase.
[00:02:52] Exploring SSML Solutions: The pros and cons of using SSML tags for pronunciation control.
[00:04:15] The Generative Text Challenge: How to handle correct pronunciation when text is dynamically generated.
[00:07:58] The Phoneme Alphabet Approach: Using X-SAMPA to specify pronunciation directly.
[00:09:06] A Live Experiment: Allen demonstrates his phoneme-based solution using AI Studio and Gemini.
[00:10:51] Testing Edge Cases: Exploring the limitations of the phoneme approach with past tense verbs.
[00:12:19] The Multimodal LLM Dream (Now a Reality?): Allen and Mark discuss the potential of LLMs generating perfect audio.
[00:13:20] Alternative Approaches: Mark suggests using parts-of-speech tagging for enhanced context.
[00:15:16] The Future of TTS (Then and Now): Discussing the evolution of text-to-speech technology and its integration with LLMs, including reflections relevant to the latest preview models like GPT-4o Realtime and Audio.
[00:17:22] Community Call to Action: Share your solutions and insights on handling tricky TTS pronunciations! How do the latest LLM advancements impact your approach?
Our thanks to bonadio (https://github.com/bonadio) for their question.
#GenerativeAI #GenAI #TextToSpeech #TTS #MultimodalLLM #Multimodal #BuildWithGemini #OpenAI #GPT4o #GPT4oRealtime #GPT4oAudio #VoiceFirst

Episode 208 - O1: Reasoning Engine or Agent's Brain?
Join us as we dive deep into OpenAI's latest model, O1, with special guest host Michal Stanislavik, founder of utter.one and one of the voice community builder behind VoiceLunch. We explore the model's "reasoning" capabilities, its potential impact on conversational AI, and how developers can leverage its strengths. Michal shares his insights from hands-on experience, highlighting both the exciting possibilities and the current limitations of O1. Is it ready for prime-time in conversational applications? What are the most promising use cases? And how does it compare to the GPT family? We discuss all this and more, including the future of agentic systems and the role of open-source models like LLaMa.
Timestamps:
[00:00:00] Introduction - Meet Michal Stanislavik and learn about his background in voice technology and community building.
[00:04:51] ChatGPT's Impact - How ChatGPT changed the conversational AI landscape and raised user expectations.
[00:08:32] Introducing OpenAI's O1 - What is O1 and how does it differ from GPT models?
[00:11:38] Reasoning and Latency - A closer look at O1's reasoning process and the implications of increased latency.
[00:17:30] Beyond Conversational Applications - Exploring potential use cases for O1 outside of real-time chatbots.
[00:23:38] Pre-Conversation Processing - Using O1 for data ingestion, analysis, and preparing follow-up questions.
[00:25:41] Code Generation Capabilities - O1's potential as a powerful tool for developers, including code generation and UI development.
[00:32:21] O1 and Agentic Systems - How O1 could become the foundation for future generations of intelligent agents.
[00:35:07] Tool Use and the Future of Agents - Integrating tools into the reasoning process and the importance of voice-to-voice models.
[00:43:39] The Competitive Landscape - How will other players like Anthropic, Google, and Meta respond to O1? The importance of open-source alternatives.
[00:46:11] Conclusion and Connect with Michal - Final thoughts and how to find Michal online.
#GenerativeAI #GenAI #Strawberry #ConversationalAI
Thumbnail by Imagen 3 with prompt:
Podcast thumbnail, illustrative with a touch of tech, cartoonish. Scene: Two podcast hosts, sitting in front of microphones, smiling and engaging in conversation. Both hosts are male, caucasian, software developers in their early 50s, with closely cropped hair The host on the left is wearing an olive green tshirt and has a\tclosely cropped\tblack and white\tbeard. The host on the right is wearing a light blue polo shirt, glasses, and is clean shaven. Background: a dark blue, futuristic landscape with glowing lines and nodes representing data flow. Overlaid on this background is a large, semi-transparent, stylized \"O1\" logo, similar to OpenAI's branding. Small icons representing code snippets, a chat bubble, strawberries, and a robot's head float around the O1. The overall mood is conversational and tech-focused, but not overly serious. Emphasis should be placed on the connection between the hosts and the futuristic, AI-driven theme.

Episode 207 - Mentorship in Software Development
Join Mark and Allen on this episode of Two Voice Devs as they dive into the often overlooked but crucial topic of mentorship in software development. They explore what mentorship is (and isn't), the benefits for both mentor and mentee, and share personal anecdotes and practical advice. Whether you're a seasoned developer or just starting out, this episode offers valuable insights into fostering a culture of learning and growth within development teams.
Timestamps:
0:00:00 - Introduction
0:00:45 - Defining Mentorship: It's more than just giving advice
0:04:15 - Mentorship vs. Education: What's the difference?
0:07:35 - Real-World Mentoring Examples: From junior devs to senior colleagues
0:09:55 - "Nobody cares how many people you had to ask for help..."
0:13:05 - A Cautionary Tale: When mentorship goes wrong
0:18:35 - Creating a Safe Space: Encouraging questions and celebrating wins
0:21:40 - The Mentor's Responsibility: Be prepared and available
0:24:55 - Vulnerability is Key: It's okay to say "I don't know"
0:26:55 - Advice for Mentees: Be open to help and willing to learn
0:28:50 - It's "Our Code": Shifting the mindset from individual ownership
0:31:10 - Mentorship as a Craft: Passing down knowledge to the next generation
0:33:40 - Call to Action: Share your mentorship experiences and questions!
Thumbnail generated by Imagen 3 with prompt:Podcast thumbnail, illustrative with a touch of tech, cartoonish.Scene: Two podcast hosts, sitting in front of microphones,smiling and engaging in conversation.Both hosts are male, caucasian, software developers in their early 50s,wearing glasses, and are clean shaven.The host on the left is wearing a dark red t-shirt and a brown flat cap.The host on the right is wearing a light blue polo shirt.Warm, inviting lighting.Background: Elements subtly hinting at a collaborative workspacewith many people involved.One figure gestures towards the people in the background suggesting mentorshipwhile the other listens intently, holding a water mug with a straw.Negative prompt: beards

Episode 206 - Building Powerful AI Agents with LangGraph
Dive into the world of agentic AI development with Allen and Mark as they explore LangGraph, a powerful state management system for building dynamic and complex AI agents with LangChain. Discover how LangGraph simplifies agent design, handles state transitions, integrates tools, and enables robust error handling – all while keeping the LLM at the heart of your application.
Further Info:
* https://github.com/langchain-ai/langgraphjs
* https://github.com/langchain-ai/langgraphjs-studio-starter/
* https://vodo-drive.com/
Timeline:
0:00 - Introduction
1:05 - What is LangGraph and how does it work?
2:38 - Understanding Nodes and Tools in LangGraph
4:54 - Navigating Conditional Edges and State Transitions
7:00 - Real-world example: LangGraph in action
12:56 - State Management in LangGraph: Maintaining Context
14:24 - Comparing LangGraph to Jovo and Voiceflow for Conversational Design
17:11 - Handling both freeform and directed conversations with LangGraph
20:40 - Vodo Drive: A deeper dive into a real-world LangGraph implementation
26:31 - The LLM's role: Tool Selection, Execution, and Output Generation
29:51 - Managing Latency and Optimizing LLM Calls
32:08 - Working with JSON responses: Tools vs Human-Readable Output
36:04 - The bigger picture: LangGraph's role in building robust AI agents
Don't forget to subscribe and leave your comments below, especially if you're using LangGraph in your own projects!
#GenAI #GenerativeAI #LangChain #LangGraph #AIAgents #Agents
Thumbnail by Imagen 3 with prompt:
Podcast thumbnail, illustrative with a touch of tech, cartoonish.
Scene: Two podcast hosts, sitting in front of microphones,
smiling and engaging in conversation.
Both hosts are male, caucasian, software developers in their early 50s, wearing glasses, and are clean shaven.
The host on the left is wearing a blue t-shirt and a brown flat cap.
The host on the right is wearing a light blue polo shirt.
Background: A flowchart consisting of boxes and diamonds, connected with glowing strands of light.
Negative prompt: beards

Episode 205 - Gemini + LangGraph Agents + Google Sheets = Vodo Drive
Join us as we explore Vodo Drive, an innovative project that leverages Google's Gemini AI to revolutionize how we interact with spreadsheets. Creator Allen Firstenberg takes us behind the scenes, revealing the architecture, challenges, and breakthroughs of building an agentic system that understands and manipulates data like never before.
Discover how Vodo Drive:
* Empowers natural language interaction: Say goodbye to rigid formulas and hello to conversational commands.
* Integrates image recognition: Effortlessly input data by simply taking pictures.
* Provides real-time feedback: Experience transparent processing with live updates on your requests.
* Prioritizes security and user control: Maintain data privacy and manage permissions seamlessly.
More Info:
* Vodo Drive: https://vodo-drive.com/
* Gemini API on Vertex AI: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
* LangChain: https://www.langchain.com/langchain
* LangGraph: https://www.langchain.com/langgraph
* Google Sheets: https://workspace.google.com/products/sheets/
* Firebase: https://firebase.google.com/
Timestamps:
* (0:00:00) Introduction and Project Overview: Discover the inspiration and goals behind Vodo Drive's participation in the Gemini API competition.
* (0:03:30) Reimagining Spreadsheet Control: Explore the evolution of Vodo Drive from voice-controlled spreadsheets to an AI-powered agentic system.
* (0:07:45) The Power of Visual Input: Learn how Vodo Drive seamlessly integrates image recognition to extract and input data from pictures.
* (0:11:55) Contextual Awareness and Conversational Flow: Delve into the importance of contextual awareness and how Vodo Drive maintains the flow of information.
* (0:14:30) Optimizing Tasks with the Right Tools: Understand the strategic use
of spreadsheets as the computational backbone for Vodo Drive's data processing.
* (0:15:30) System Design and Architecture Breakdown: Get a detailed look at the core components of Vodo Drive, including Firebase Cloud Functions, Firestore, and Authentication.
* (0:22:55) Addressing Security Concerns: Explore the safety measures implemented to protect user data and prevent unauthorized actions.
* (0:26:35) Real-Time Updates and User Experience: Discover how Vodo Drive leverages Firestore to provide real-time feedback and enhance user experience.
* (0:32:30) Behind the Scenes: The AI's Internal Dialogue: Uncover the hidden conversations happening between the agent and the LLM during data processing.
* (0:38:05) Firebase Authentication and Authorization: Learn how Vodo Drive ensures secure access to user spreadsheets and leverages Google's authorization system.
* (0:40:45) Firebase Cloud Storage and Media Handling: Explore the role of cloud storage in managing user-uploaded photos and audio files.
* (0:43:35) Gemini's Role in Image Processing and Agentic Logic: Discover how Gemini powers both image recognition and the decision-making process of the agentic system.
Don't miss this insightful discussion on the future of AI-powered data management and how Vodo Drive is paving the way for a more intuitive and efficient user experience.
#GeminiAPI #LLM #AgenticSystems #VoiceControl #Spreadsheets #Firebase #WebDevelopment #AndroidDevelopment #AI #Innovation

Episode 204 - Alexa Skill Sunset Strategies
In this episode of Two Voice Devs, Allen and Mark discuss the considerations and strategies for shutting down an Alexa skill. They explore various reasons why developers might choose to sunset their skills, including declining usage, deprecated features, and the evolving Alexa landscape. They also delve into the technical aspects of skill removal, highlighting the options of hiding or removing a skill and the implications of each choice, especially when dealing with in-skill purchases and subscriptions. Mark shares his own experiences with managing his six Alexa skills, offering insights into his decision-making process for each one.
Timestamps:
* 0:00:00 Introduction
* 0:00:12 Why shut down an Alexa skill?
* 0:01:11 Decreased promotions, uncertainty about Alexa's future, AWS costs
* 0:04:35 Alexa-hosted skills and AWS free tier
* 0:06:02 Considerations beyond cost: analytics, deprecated features, impact on users
* 0:06:56 Challenges with in-skill purchases and subscriptions
* 0:08:01 Skill hiding vs. skill removal
* 0:10:26 Case Study: "Serve More" skill and lack of organizational adoption
* 0:13:35 Case Study: "Snatch Word" game and handling subscriptions during sunset
* 0:17:46 Managing user communication during skill shutdown
* 0:17:58 Case Study: "Picture Guesser" skill and content challenges
* 0:19:46 Case Studies: "Show Widgets" and "Busy Timers" – skills with ongoing personal use
* 0:22:29 Building skills for personal use first
* 0:23:32 Importance of planning a sunset process for brand image and user experience
* 0:24:27 Call to action for audience sharing
Join the conversation and share your own experiences with shutting down Alexa skills!

Episode 203 - Imagen 3: Stunning Realism & Ethical Questions
Join Allen and Linda as they dive into Google's Imagen 3 and Imagen 3 Fast, a powerful new set of image generation models. We explore its capabilities, pricing, features, and limitations, including a deep dive into the API and how to use it with Python code.
This episode features an in-depth look at Imagen 3's photorealism and comparison with its predecessor, Imagen 2. We examine the ethical implications of AI image generation, discussing copyright issues, plagiarism concerns, and the impact on artists.
Don't miss the stunning visuals and thought-provoking discussion!
Resources:
* https://console.cloud.google.com/vertex-ai/generative/vision
* https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
* https://fc-art.medium.com/seurat-pointillism-e240074a03dc
* https://commons.wikimedia.org/wiki/File:La_Libert%C3%A9_guidant_le_peuple_-_Eug%C3%A8ne_Delacroix_-_Mus%C3%A9e_du_Louvre_Peintures_RF_129_-_apr%C3%A8s_restauration_2024.jpg
Timestamps:
* 00:00:31 : Introducing Imagen 3 & its Photorealistic Power
* 00:02:59 : Imagen 3 vs. Imagen 3 Fast: Speed, Quality, & Pricing
* 00:05:33 : Copyright & Commercial Use of Imagen-Generated Images
* 00:06:14 : Exploring the Imagen API with Python Code Examples
* 00:09:08 : Using Gemini to Generate Prompts for Imagen
* 00:11:15 : The Importance of Seed Control for Image Consistency
* 00:13:24 : Watermarking & Identifying AI-Generated Images
* 00:14:51 : Navigating Imagen's Safety Filters & Limitations
* 00:18:13 : Live Demo: Generating a Cat Image with Imagen 3
* 00:18:55 : Future Potential: Editing & Outcropping Capabilities
* 00:22:26 : Upscaling Images with Imagen: Costs & Possibilities
* 00:23:18 : Comparing Image Styles Across Imagen Versions (with Visuals!)
* 00:28:58 : Confronting the Ethical Concerns of AI Image Generation
* 00:29:05 : Real-World Examples: Inappropriate Content & Plagiarism
* 00:36:30 : The Impact of AI on Artists & the Definition of Art
* 00:37:06 : Transparency & Responsibility: Crediting AI in Creative Work
* 00:39:53 : Final Thoughts: Will We Continue Using Imagen 3?
Thumbnail created with Imagen 3 using the prompt:
Create a compelling thumbnail for a YouTube video podcast about AI image generation, specifically Google's Imagen 3, featuring two hosts. Include two distinct, friendly faces – one male, one female – representing the podcast hosts. The male host should be wearing glasses and a light blue collared shirt. The female host should be wearing glasses, her hair tied back in a pony tail, and wearing a grey t-shirt that says "MakerSuite". They should be facing the viewer with engaging expressions, perhaps a mix of excitement and contemplation. Showcase a visually striking AI-generated image emerging from a laptop screen or a thought bubble. The overall image should be done in the style of George Seurat.
(Episode number and title added manually.)
#VertexAI #Imagen3 #GenAI

Epsiode 202 - Hosting and Large Language Models
Join Allen Firstenberg and Mark Tucker on Two Voice Devs as they discuss the challenges and solutions of hosting large language models (LLMs). They explore various hosting environments, including Firebase, AWS Amplify, Vertex AI, and Docker/Kubernetes, comparing their strengths and weaknesses.
Allen shares his experience with Firebase Cloud Functions and the seamless integration with Google Cloud services, while Mark tackles the complexities of Docker, Kubernetes, and enterprise-level deployment strategies. From managing API keys and credentials to implementing design patterns and best practices, they explore the challenges and solutions for building robust and scalable AI systems.
This episode is packed with practical tips for developers, covering topics like:
[00:02:00] Firebase Suite of Tools: Learn how Firebase provides a comprehensive platform for hosting LLMs, including real-time databases, cloud storage, cloud functions, and authentication.
[00:04:00] Firebase vs. AWS Amplify: Discover the key differences between these two popular serverless platforms and their database options.
[00:05:00] Cloud Service Accounts for Security: Allen demonstrates how leveraging cloud service accounts can simplify permission management and enhance security.
[00:11:00] Architecture Design and Long-Term Hosting: Allen emphasizes the impor
tance of considering future scalability and maintenance when selecting a hosting environment.
[00:12:30] Working with Docker and Kubernetes: Mark dives into his experience using Docker containers and Kubernetes for enterprise-level LLM deployment.
[00:15:00] Learning Python for LLM Development: Mark shares his experience learning Python for working with LLMs and using libraries like FastAPI for REST API development.
[00:17:00] Design Patterns and Best Practices: Allen and Mark discuss the evolving nature of design patterns and their importance in modern software development.
[00:20:00] KitOps for Model Deployment: Mark explains how KitOps can be used to separate model deployment from service deployment in a Kubernetes environment.
[00:23:00] Docker and Configuration Management: Allen discusses the challenge of configuration management in Docker environments and how to manage changes efficiently.
[00:24:00] Enterprise Security and Tooling: Mark explores the use of tools like HashiCorp Console and Vault for managing configurations and secrets in enterprise deployments.
[00:26:00] The Importance of Containerization: Allen and Mark reiterate the fundamental role of containers in modern software development.
Don't miss this insightful episode of Two Voice Devs, where you'll gain valuable insights and practical tips for hosting and deploying your own LLMs!
#AI #Development #Hosting #Cloud #Docker #Kubernetes #Firebase #GoogleCloud #DesignPatterns #TwoVoiceDevs

Episode 201 - Introduction to KitOps for MLOps
Join Allen and Mark in this episode of Two Voice Devs as they dive into the world of MLOps and explore KitOps, an open-source tool for packaging and versioning machine learning models and related artifacts. Learn how KitOps leverages the Open Container Initiative (OCI) standard to simplify model sharing and deployment.
More info:
- https://kitops.ml
Key Topics and Timestamps:
- What is DevOps? (0:00:41) - Allen and Mark discuss the fundamentals of DevOps and its role in software development and operations.
- Introduction to MLOps (0:04:02) - The conversation shifts to MLOps, highlighting the unique challenges and requirements of managing machine learning models in production.
- What is KitOps? (0:07:27) - Mark introduces KitOps and explains its core functionality for packaging models, code, data, and documentation into a single, versioned unit.
- Understanding the Kit File (0:16:33) - A closer look at the structure and components of a Kit file, the YAML-based configuration file used by KitOps.
- The Kit CLI and Model Kits (0:18:20) - Explanation of the Kit command-line interface and the concept of model kits as containerized packages.
- Docker Analogy and Key Differences (0:20:03) - Allen and Mark draw parallels between KitOps and Docker, emphasizing the focus on file management and versioning in KitOps.
- Benefits of Using KitOps (0:26:35) - Discussion on the advantages of using KitOps over alternative approaches like zip files or direct GitHub storage.
- Workflow Examples and Integration (0:29:05) - Practical examples of how KitOps can be integrated into data science and development workflows, including build pipelines and production deployments.
- Call to Action and Community Engagement (0:33:02) - Encouraging viewers and listeners to share their thoughts, experiences, and alternative solutions in the comments and on social media.
Don't miss this insightful episode of Two Voice Devs as Allen and Mark demystify KitOps and its potential for streamlining your MLOps practices. Subscribe to our channel for more developer-focused discussions and tutorials!
#MLOps #KitOps #MachineLearning #DevOps #AI #SoftwareDevelopment #Podcast #TwoVoiceDevs

Episode 200 - Four Years and Looking Forward
Mark Tucker and Allen Firstenberg celebrate 200 episodes and four years of Two Voice Devs! In this special episode, they reflect on the journey so far, the evolution of the AI landscape, and what excites them most about the future of development.
Join them as they discuss:
00:00 Four years ago...
00:10 The evolution of large language models (LLMs) and how the landscape has shifted over the past year.
03:10 The emergence of new players in the AI model space and how Google, Microsoft, and Amazon are vying for dominance.
05:30 The growing trend of smaller and locally deployable models and the future of AI development.
08:00 The ongoing quest for seamless integration of conversational AI with web experiences.
10:30 The need for a convergence of traditional NLU concepts with modern AI approaches.
11:30 The pressing need for sustainability and responsible development in the AI space.
14:00 The importance of integrating AI tools with existing methods and workflows
.
16:00 An open invitation for developers to join Mark and Allen as co-hosts and share their perspectives on AI development.
18:00 A reminder that learning is at the heart of the developer experience and the importance of community.
20:00 The highlights from their favorite episodes over the past four years.
23:00 The value of connection and friendship within the developer community.
26:09 Four years ago...
Don't miss this milestone episode as Two Voice Devs look back and look forward!

Episode 199 - Is the Future of AI Local?
Join Allen Firstenberg and Roger Kibbe as they delve into the exciting world of local, embedded LLMs. We navigate some technical gremlins along the way, but that doesn't stop us from exploring the reasons behind this shift, the potential benefits for consumers and vendors, and the challenges developers will face in this new landscape. We discuss the "killer features" needed to drive adoption, the role of fine-tuning and LoRA adapters, and the potential impact on autonomous agents and an appless future.
Resources:
* https://developer.android.com/ai/aicore
* https://machinelearning.apple.com/research/introducing-apple-foundation-models
Timestamps:
00:20: Why are vendors embedding LLMs into operating systems?
04:40: What are the benefits for consumers?
09:40: What opportunities will this open up for app developers?
14:10: The power of LoRA adapters and fine-tuning for smaller models.
17:40: A discussion about Apple, Microsoft, and Google's approaches to local LLMs.
20:10: The challenge of multiple LLM models in a single browser.
23:40: How might developers handle browser compatibility with local LLMs?
24:10: The "three-tiered" system for local, cloud, and third-party LLMs.
27:10: The potential for an "appless" future dominated by browsers and local AI.
28:50: The implications of local LLMs for autonomous agents.

Episode 198 - Wisdom from Unparsed: LLMs are Hammers, Not Silver Bullets
Join us on Two Voice Devs as we welcome back Roger Kibbe. Fresh off emceeing the developer track at the Unparsed Conference in London, Roger shares his insights on the biggest takeaways, trends, and challenges facing #GenAI, #VoiceFirst and #ConversationalAI developers today.
Get ready for a dose of reality as Roger emphasizes the need to view LLMs as powerful tools – think hammers – rather than magical solutions. We dive deep into:
Timestamps:
* 0:00 - Intro
* 1:56 - Exploring the Unparsed Conference
* 4:47 - LLMs: The hype vs. the reality for developers
* 6:37 - The underappreciated power of LLMs for "understanding", not just generating
* 11:03 - The right tool for the job: Why a toolbox approach is essential for conversational AI
* 13:52 - Beyond the chatbot: Detecting emotion and the future of human communication
* 20:28 - Hackathon highlights and the need for more realistic QA approaches
* 28:55 - Navigating the shift from deterministic to stochastic systems
* 31:59 - Will AI replace junior developers?
* 36:30 - How senior developers can (and can't) benefit from AI coding assistants
* 39:04 - Final thoughts: The value of cutting through the hype
Don't miss this insightful conversation about the future of conversational AI development – grab your toolbox and hit play!

Episode 197 - Alexa Skill Development in the Age of LLMs
What should people developing with LLMs learn from a decade of experience building Alexa skills? How will Alexa skill developers leverage the latest #GenerativeAI and #CoversationalAI tools as they continue to build #VoiceFirst and multimodal skills?
Join Allen and Mark on Two Voice Devs as they delve into the evolving landscape of Alexa skill development in the era of large language models (LLMs). Sparked by a thought-provoking discussion on the Alexa forums, they explore the potential benefits and challenges of integrating LLMs into skills.
Key topics and timestamps:
(0:00:00) Introduction
(0:02:00) LLMs and the Future of Alexa Skills
(0:04:00) Limitations of Current Alexa Skill Model with LLMs
(0:07:00) Benefits and Drawbacks of Developing for Alexa
(0:10:30) Overlooked Potential of Multimodality with LLMs
(0:14:50) Lessons from Early Voice Experiences
(0:17:00) Intents vs. Tool/Function Calling
(0:21:30) Handling Hallucinations and Off-Topic Requests
(0:22:00) LLMs' Ability to Handle Nuanced Intents
(0:28:00) Cost Considerations of LLMs
(0:32:00) Monetizing LLM-Powered Alexa Skills
(0:39:40) The Future of Alexa Skill Development: A Hybrid Approach?
(0:40:00) Outro
Tune in as they discuss the need for hybrid models, the importance of conversation design, and the uncertain future of monetization in this rapidly changing landscape. Don't forget to join the conversation on the Alexa Slack channel or leave your thoughts in the comments below!

Epsidoe 196 - Is GPT 4o a Game Changer?
OpenAI's ChatGPT 4o and GPT 4o announcements have sent shockwaves through the developer community! In this episode of Two Voice Devs, Mark and Allen dive into the implications of these new models, comparing them to Google's Gemini.
We discuss:
[00:00:10] Initial takeaways from the OpenAI presentations.
[00:02:29] The impressive voice capabilities of ChatGPT 4o.
[00:04:49] Concerns about OpenAI's ambitions for conversational AI.
[00:07:30] The difference between "doing" and "knowing" AI systems.
[00:14:15] A detailed breakdown of GPT 4o, including its strengths and weaknesses.
[00:17:43] Comparison with Gemini and implications for developers.
[00:19:41] The importance of competition in driving innovation and lowering prices.
[00:21:48] The future of AI assistants and the role of developers.
Let us know what you think about GPT 4o and Gemini! Have you used them? Share your experiences and thoughts in the comments below.

Episode 195 - Android, Agents, and the Rabbit R1
Allen Firstenberg chats with fellow Google Developer Expert (GDE) Mike Wolfson about his career, the evolution of Android, and his new interest in generative AI. Mike shares his thoughts on the future of AI with agents, Large Action Models (LAMs), and the potential of the "Rabbit," a new AI-powered device. Does the Rabbit live up to its promise? If not - what could?
Timestamps:
00:00:00 - Introduction
00:01:32 - Mike's career journey
00:04:15 - Transition from enterprise Java to Android development
00:05:04 - Creating "Droid of the Day" app
00:06:49 - Becoming an Android developer and Google Developer Expert
00:09:23 - Shift in focus from Android to generative AI
00:10:57 - Generative AI as a platform
00:11:47 - The Rabbit and its potential
00:14:59 - Mike's take on the Rabbit as a developer
00:17:31 - Current integrations with the Rabbit
00:19:52 - The future of AI and the Rabbit
00:24:46 - Edge AI and its potential
00:27:16 - The capabilities of the Rabbit and its future
00:32:17 - The Rabbit vs. other devices like meta glasses
00:34:28 - Conclusion and call to action

Episode 194 - Google AI/O 2024
Join Allen and Roya as they dissect the major AI announcements from Google I/O 2024. From Gemini updates and new models to responsible AI and groundbreaking projects like ASTRA, this episode dives into the future of AI development.
Timestamps:
[00:00:00] Introduction and Google I/O Overview
[00:02:00] Gemini 1.5 Flash & Gemini 1.5 Pro: New Models and Features
[00:04:30] AI Studio Access Expansion for Europe, UK & Switzerland
[00:06:20] Choosing the Right AI Model for Your Project
[00:06:50] Gemini Nano in Google Chrome: Bringing AI to the Browser
[00:08:00] Pali Gemma: Open Source Model with Image & Text Input
[00:08:50] AI Red Teaming & Model Safety Tools
[00:09:50] Parallel Function Calling for Developers
[00:10:30] Video Frame Extraction: Easier Multimodal Development
[00:11:20] GenKit: Firebase's Generative AI Integration
[00:12:00] Gems: Customizable Gemini for Developers
[00:12:50] Semantic Embeddings: Understanding & Creating Images
[00:13:50] Imogen 3: API Access for Image Generation
[00:14:20] Veo: Video Generation with Lumiere Architecture
[00:14:50] SynthID: Watermarking & Identifying Generated Content
[00:16:30] Responsible AI & Inclusivity
[00:18:00] Gemini Developer Competition: Win a DeLorean & Cash Prizes!
[00:19:30] Project ASTRA: Multimodal AI with Contextual Memory
[00:21:00] Google Glasses & Project ASTRA Integration
[00:22:00] Closing Thoughts: AI for Everyone

Episode 193 - Revolutionizing Intent Classification
Join Allen and Mark as they delve into Voiceflow's groundbreaking new feature: intent classification using a hybrid of LLMs and classic NLU models. Discover how this innovative approach leverages the strengths of both technologies to achieve greater accuracy and flexibility in understanding user intent. How they're doing it just may blow your mind! 🤯
Timestamps:
0:00:00 - Introduction
0:00:33 - Exploring the concept of intents and slots in conversational UI
0:05:11 - Understanding Natural Language Understanding (NLU) and its role in intent classification
0:06:02 - Voiceflow's hybrid approach: Combining classic NLU with LLMs
0:08:36 - Deep dive into Voiceflow's documentation on intent classification using LLMs
0:13:43 - Understanding the hybrid approach and its components: intent descriptions, prompt wrappers, and training data
0:24:31 - How the classic NLU model pre-filters intents for the LLM, improving efficiency and accuracy
0:27:27 - Exploring the user experience and the flow of intent classification with the hybrid model
0:32:53 - Voiceflow's commitment to open research and sharing knowledge with the developer community
0:35:52 - The value of benchmarking and analyzing different LLM models for intent classification
0:39:12 - Call to action: Share your thoughts and experiences with Voiceflow's hybrid approach

Episode 192 - Google Cloud Next 2024 Recap
Join Allen Firstenberg and guest host Stefania Pecore on Two Voice Devs as they delve into the exciting announcements and highlights from Google Cloud Next 2024! This episode focuses on the latest advancements in AI and their impact on the healthcare industry, providing valuable insights for developers and tech enthusiasts.
Learn more:
* https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2024-wrap-up
Timestamps:
00:00:00: Introduction
00:01:02: Stefania's background and journey into AI
00:07:20: Stefania's overall experience at Google Cloud Next
00:11:59: Focus on Healthcare and AI applications, including Mayo Clinic's Solution Studio
00:15:38: Exploring the new Gemini product suite and its features like code assistance and data analysis
00:20:44: Discussing Gemini API updates, including the 1.5 public preview with 1M token context window and grounding tools
00:26:06: Vertex AI Agent Builder and its no-code approach to chatbot developmen
t
00:33:02: Hardware announcements, including the A3 VM with NVIDIA H100 GPUs
00:35:24: Stefania's reflections on Cloud Next and the value of attending
Tune in to discover the future of AI and its transformative potential, especially in the healthcare sector. Share your thoughts on the Google Cloud Next announcements in the comments below!

Episode 191 - Beyond the Hype: Exploring BERT
This episode of Two Voice Devs takes a closer look at BERT, a powerful language model with applications beyond the typical hype surrounding large language models (LLMs). We delve into the specifics of BERT, its strengths in understanding and classifying text, and how developers can utilize it for tasks like sentiment analysis, entity recognition, and more.
Timestamps:
0:00:00: Introduction
0:01:04: What is BERT and how does it differ from LLMs?
0:02:16: Exploring Hugging Face and the BERT base uncased model.
0:04:17: BERT's pre-training process and tasks: Masked Language Modeling and Next Sentence Prediction.
0:11:11: Understanding the concept of masked language modeling and next sentence prediction.
0:19:45: Diving into the original BERT research paper.
0:27:55: Fine-tuning BERT for specific tasks: Sentiment Analysis example.
0:32:11: Building upon BERT: Exploring the Roberta model and its applications.
0:39:27: Discussion on BERT's limitations and its role in the NLP landscape.
Join us as we explore the practical side of BERT and discover how this model can be a valuable tool for developers working with text-based data. We'll discuss i
ts capabilities, limitations, and potential use cases to provide a comprehensive understanding of this foundational NLP model.