April 25, 2026
My thesis here is that building user-facing AI applications is a full stack software engineering activity:
- model
- inference code
- APIs that serve and facilitate inference (including prompt caching, safety guardrails, etc)
- orchestration frameworks
- app-specific harness and prompt logic that sits directly on top of the orchestration layer
- the "UX" that the user interacts with
For voice agents, we've been refining the orchestration, logic, and UX layers for more than two years. We have a very good understanding of the best practices for building low-latency, reliable, successful voice agents for a wide range of use cases. You've probably talked to a customer support voice agent recently!
But we're also pushing the boundaries of what today's models can do. So we have things we want to see the APIs, inference stacks, and models do better. We're starting to fine tune LLMs for specific use cases. We're building more and more complicated multi-agent/multi-model systems to do more and more interesting things!
For example: very long-running conversational agents, multi-modal co-pilots that understand everything happening on your computer, and massively multi-player games.
Built to Ship with @kwindla learning why building great AI agents isn’t just a prompt or model problem. This is still an engineering problem with a specialization in LLMs.
