April 11, 2025
Yes! We've worked with a number of people building language learning platforms and apps. In general, LLMs today are very good at handling multiple languages and translating between them (in text mode).
A couple of things to look at.
The newest speech-to-text models from @gladia_io have very good language auto-detection. Demo and sample code here:
📺 https://t.co/qveaGyTm1d
For text-to-speech, both the @cartesia_ai Sonic 2 model and @OpenAI's gpt-4o-mini-tts are good at handling mixed languages.
📺 https://t.co/GVCH5Zp1ig
For most use cases, a tts->llm->stt design is the most reliable and flexible way to build voice agents, today. *But* language learning is a place where the advantages of the unified speech-to-speech models really show! So it's also worth experimenting with the OpenAI Realtime API and the Google Multimodal Live API.
➡️ https://t.co/RbJA7dmPId
➡️ https://t.co/Acrbn051ci
📺 https://t.co/vrq7bQFmmK
@kwindla Thanks for sharing. Do you have any data / design for multi-lingual cases, like tutoring session where English speaker is tutored in French?