← kwindla hultman kramer

Another example of the multiple TTS parallel pipelines pattern

June 22, 2025

Another example of the multiple TTS parallel pipelines pattern. Here's a voice AI agent that speaks both English and Arabic, using a specific model/voice for each language.

These are @PlayAIOfficial voices. The STT, TTS, and LLM inference is all running on @GroqInc. (The LLM is…

Apologies that my terminal prints the Arabic backwards!

Here's the code. You can run this yourself. You just need a Groq API key.

https://t.co/plORUqcEkk

Here's the prompt. Llama 4 Maverick is pretty good at following these multi-language formatting instructions. But definitely not perfect. This prompt could use more work. Might be fun to use @DSPyOSS to improve this!

Related thread from yesterday: using parallel TTS pipelines to narrate a story.

https://t.co/ZhypbkBVPB

kwindla@kwindla

The @OpenAI gpt-4o-mini-tts model is very good and a lot of fun to experiment with.

It's also different from the other TTS models we typically use for realtime voice AI applications. The model is steerable: you can tell it how to say things in addition to telling it what to say.

Here's an interactive story with two voices, each generated by a different configuration of gpt-4o-mini-tts.

Video from @kwindla's post
  1. https://github.com/daily-co/pcc-groq-llama/blob/main/bot-en-ar.py