March 13, 2025
Further tinkering with my little French tutor app. This version is using the Gemini Multimodal Live API.
The speech understanding in Gemini is quite something.
In this video you can see Gemini correcting my pronunciation. (Very patiently.)
The language tutor use case really highlights the strengths of a next-generation speech model like Gemini.
This is 90 lines of @pipecat_ai code, and uses WebRTC for super low-latency, super reliable network transport.
Code is here[1]
Here are the docs for Pipecat's open source client SDKs for javascript, React, iOS, Android, React Native, and C++[2]
The Pipecat clients all support WebRTC, WebSocket, and HTTP network transports.