April 16, 2025
Can you beat my 1-929-LLM-GAME high score?
We've been exploring what you can do with speech-to-speech models.
Here's a word guessing game, built with the Gemini Multimodal Live API, Vercel, and Twilio, that has a bunch of interesting features ... 🧵
The web version of the game is here:
Source code is here:
Clone this, modify it, run it locally, deploy your own version!
If you're a voice AI developer, you'll find the code for the game interesting to read through.
➡️ The game uses complex function calling and orchestration logic in combination with the Gemini Multimodal Live speech-to-speech API.
➡️ There are two different user experiences for the phone and the web.
➡️ The phone game has two parallel Gemini inference pipelines, one for the "judge" and one for the LLM "player." The judge hears everything the player says, but not vice versa.
➡️ The @krispHQ noise reduction model minimizes background speech so the LLM(s) can focus on what the human player is saying. This improves both response relevance and turn detection.
➡️ The code shows how to wire up a voice agent to a Twilio phone number for inbound calling. The Twilio pipeline uses Twilio Media Streams for voice transport.
➡️ The web game uses WebRTC for reliable, super low-latency voice transport.
My high score on the web game is 8. On the phone game it's 6.
Game: https://t.co/jU78hAUP4A
Code: https://t.co/gg9rMS0oxU
Phone: 929-LLM-GAME (+1-556-4623)
Credit to @mark_backman and @JonPTaylor for building this.
Thanks to @GoogleDeepMind, @krispHQ, @vercel, @twilio, and @pipecat_ai for all the great models and tools!
@altryne, if you have a chance to do the kid voice speech recognition test, I'd love to hear how it goes.