← kwindla hultman kramer

In June we shipped the (then) world's fastest voice AI agent — 500ms…

November 12, 2024

In June we shipped the (then) world's fastest voice AI agent — 500ms voice-to-voice latency including network transport.

The (not-so-) secret sauce behind that demo was @pipecat_ai + @cerebriumai serverless AI infrastructure + @trydaily WebRTC infrastructure.

The team at @cerebriumai is live on Product Hunt today. They've built infrastructure that is super impressive: very fast cold-start times, access to a bunch of different GPUs, support for realtime AI, seamless scaling.

I've learned a *lot* from working them over the past few months on customer-facing deployments and several fun demos.

Go check out what Cerebrium is doing if you're interested in deploying your own AI models and services! -> https://t.co/3WyDezbFHY

As a matter of historical interest, here's the original World's Fastest Voice AI Bot demo and technical write-up:

https://t.co/SYvOi0IjXJ

kwindla@kwindla

How to build the world's fastest voice AI bot:

- Self-host speech-to-text, LLM inference, and text-to-speech all together in the same container/cluster.
- Route audio over the internet using WebRTC and edge networking.
- Configure timings for voice activity detection, phrase endpointing, and other parts of the pipeline to optimize for latency. (There are trade-offs to doing this!)

Here's a LLama 3 voice bot that has voice-to-voice response times of ~500ms.

We used @DeepgramAI's STT and TTS for this bot, and everything is hosted on @cerebriumai's serverless GPU infrastructure.

Video from @kwindla's post