June 22, 2025
Another example of the multiple TTS parallel pipelines pattern. Here's a voice AI agent that speaks both English and Arabic, using a specific model/voice for each language.
These are @PlayAIOfficial voices. The STT, TTS, and LLM inference is all running on @GroqInc. (The LLM is…
Apologies that my terminal prints the Arabic backwards!
Here's the code. You can run this yourself. You just need a Groq API key.
https://t.co/plORUqcEkk
Here's the prompt. Llama 4 Maverick is pretty good at following these multi-language formatting instructions. But definitely not perfect. This prompt could use more work. Might be fun to use @DSPyOSS to improve this!
Related thread from yesterday: using parallel TTS pipelines to narrate a story.
https://t.co/ZhypbkBVPB
The @OpenAI gpt-4o-mini-tts model is very good and a lot of fun to experiment with.
It's also different from the other TTS models we typically use for realtime voice AI applications. The model is steerable: you can tell it how to say things in addition to telling it what to say.
Here's an interactive story with two voices, each generated by a different configuration of gpt-4o-mini-tts.
