May 13, 2024
We have consistent ~500ms audio-in-to-first-token latency with a pipeline of @DeepgramAI transcription -> @GroqInc LLama-3 -> Deepgram TTS now. So I think it's pretty realistic that a natively speech-to-speech model can get down to ~300ms.
We've been assuming this is where things will go. Tightly integrating the WebRTC transport layer with the inference infrastructure can help, too.
the new openai stuff looks cool and i’m definitely not trying to dismiss it
but a large part of what makes these demos impressive is the lack of latency - which i don’t see surviving in something public facing