August 14, 2025
At the @aiDotEngineer World's Fair in June, @shresbm and I gave a talk about all the "magic" that goes into making great voice AI experiences.
Magic, in the sense of making hard things look easy. Magic, in the sense of sufficiently advanced technology being indistinguishable from.
Shrestha and her team train the LLMs and make the APIs we all use. I work mostly higher up, at the orchestration and application levels.
We thought it would be fun to show the push-pull tension between making use of the open-ended, emergent capabilities of today's SOTA models, while also writing scaffolding that makes model behavior predictable enough to use as part of production systems.
I hacked together a very, very simple @pipecat_ai app designed to interactively explore the "jagged frontier" of prerelease Gemini 2.5 Flash and Pro models. These models have built-in Google search, code generation, and compound tool use. They are good at figuring out how to do things agentically. And they do surprising things, sometimes good, sometimes not so good.
We experimented with the app live during the talk. I've done a lot of live demos. I've rarely seen an audience laugh as hard or have as much fun as this room did watching Shrestha talk to Gemini Live.
Video of the full talk, part of the AI Engineer World's Fair "Voice AI" playlist. The full talk goes into details about what it takes to build reliable, production-grade voice agents today.