We have consistent ~500ms audio-in-to-first-token latency with a pipeline of…

May 13, 2024

We have consistent ~500ms audio-in-to-first-token latency with a pipeline of @DeepgramAI transcription -> @GroqInc LLama-3 -> Deepgram TTS now. So I think it's pretty realistic that a natively speech-to-speech model can get down to ~300ms.

We've been assuming this is where things will go. Tightly integrating the WebRTC transport layer with the inference infrastructure can help, too.

dax@thdxr

the new openai stuff looks cool and i’m definitely not trying to dismiss it

but a large part of what makes these demos impressive is the lack of latency - which i don’t see surviving in something public facing