June 27, 2024
How to build the world's fastest voice AI bot:
- Self-host speech-to-text, LLM inference, and text-to-speech all together in the same container/cluster.
- Route audio over the internet using WebRTC and edge networking.
- Configure timings for voice activity detection, phrase endpointing, and other parts of the pipeline to optimize for latency. (There are trade-offs to doing this!)
Here's a LLama 3 voice bot that has voice-to-voice response times of ~500ms.
We used @DeepgramAI's STT and TTS for this bot, and everything is hosted on @cerebriumai's serverless GPU infrastructure.
Live demo and link to source code here[1]
Technical write-up here[2]
HN discussion[3]