This is fantastic. ** We need more and better AI eval and observability tools…

November 18, 2024

This is fantastic. ** We need more and better AI eval and observability tools — in general, and especially for conversational voice AI **

At a dinner last week hosted by @mariabrw, we all went around the table and offered a prediction for the next year of AI.

I went last, by which time three people had already made predictions that I would have made, too. Which probably means I wasn't thinking big enough! But at that point I figured I'd go the other way and offer something so mundane that it's maybe being broadly overlooked.

My prediction for next year is more a requirement than a prediction: we need better eval tools (broadly defined).

SOTA LLMs like GPT-4o, Claude Sonnet 3.5, and Gemini 1.5 Flash are very capable conversation and function calling engines.

Orchestration frameworks like @pipecat_ai make it easy to build multi-turn, multi-state conversation apps that integrate with backend systems.

The biggest thing holding us back from faster and broader deployment to production is: it's too hard to go from "this thing works 90% of the time and when it works it's really valuable" to "we know that this thing works 99.9% of the time."

There are at least two categories of things to build in this space: tooling that helps us test as we develop; and tooling that helps us see what's happening in production.

Tom Shapland@tom_shapland

My cofounder, Adrian Cowham, ships fast. Here's one of the cool new features he's built to help Voice AI developers and AI agencies know what's happening in their Voice AI calls. Link to the demo is in the comments.