October 3, 2024
Old 4o vs New 4o — a dialog between two generations of voice AI
Here's the demo I showed last night at the @cloudflare/@openai builders event.
This is two GPT-4o Voice AI bots talking to each other.
The first voice is coming from the phone and is powered by the standard Daily Bots demo app. It uses @DeepgramAI transcription, GPT-4o as the LLM, and @cartesia_ai for voice generation.
The second voice is GPT-4o voice-to-voice via the new OpenAI realtime API.
Turn detection settings are the same for both.
The silence duration parameter — which is very important — is set to 800ms. I set it to this threshold on purpose, to try to trigger spurious interruptions. And, in fact, you can hear the old bot interrupt the new bot at 00:32.
What happened is that the new bot paused for just over 800ms. The "phrase endpointing" logic in Pipecat fired (correctly). The old bot started talking. The iPhone "ducked" the mic when the bot started talking, and after that either bot could have registered the interruption and stopped talking. The new bot happens to have done that first, before the old bot's mic levels normalized.
The default GPT-4o realtime voice is called `alloy`. It talks relatively slowly, even with this prompt, which includes the instruction "talk quickly."
The @cartesia_ai voice here is "British Lady." She talks a bit more quickly, and her pauses tend to be a bit shorter.
I've listened to both these voices *a lot* so I'm not sure I can hear them objectively any more! I'm curious how these two different voices register with people interacting with these bots for the first time.
If you want to try this yourself, the Daily Bots demo (and source code) are here: https://t.co/yWEy4JkZWF
The OpenAI realtime voice pipeline that I was running from the command line on my laptop is here: https://t.co/SpteImJ7ue
If you're interested in a Python reference implementation for the OpenAI realtime (and integration with client-side WebRTC SDKs) feel free to follow along with the draft PR: https://t.co/4hRtAAAron
Thanks, @lizziepika, @craigsdennis, @kevinwhinnery, and @romainhuet for hosting such a great event last night!
Also, if you're interested in real-time voice, come to the hackathon Oct 19th/20th. San Francisco and remote.