October 26, 2025
Join the @AITinkerers community for an evals-focused meetup in SF on Thursday.
I keep telling people that if you're getting started now building voice agents, evals are the hardest part of the learning curve. We have solid 80/20 solutions to everything else: the right basic architecture, model choice, low latency, turn detection, good context engineering, network transport, infrastructure/deployment.
Teams that follow the current best practices go pretty quickly from zero to POC, and then POC to scale. Choose a narrowly scoped initial use case, get it right, and then add more work flows. But ... evals are still hard. Audio input and output, multi-turn conversations, almost all agents use multiple models: we're still figuring out the right shape of eval and simulation tools for voice.
how are you testing your ai tools/apps?
vibes? are you subscribed to the eval industrial complex?
there's been a lot of discussion on the rise of Big Eval and whether its overkill. But also maybe Evals are back!?
come riff with the best builders in the bay at the beautiful