April 10, 2025
We wrote down everything we've learned building voice AI agents over the past two years.
Core technology choices, minimizing latency, managing multimodal context, interruption handling, turn detection, evals, state machines, guardrails, memory, async and realtime function calling, ...
Plus diagrams, charts, and code samples.
The text is open source, so feel free to submit PRs.
Here's the link[1]
Sascha Mombartz did the print and graphic design. I highly recommend working with Sascha if you have the opportunity[2]
Working with a talented designer is so, so, so much fun. Sascha consciously incorporated AI…
This guide was initially available only in print. We wrote the first version of the text for the @aiDotEngineer Summit in New York in February.
We've updated it to include information about new models, feedback from people who read the print version, and additional code samples.
If you want an original print version, come to the SF Voice AI Meetup this Wednesday. We have about 20 copies left.
https://t.co/yllK2lwf22