August 29, 2025
I got a bunch of questions about the cost of the Realtime API yesterday after posting this.
tldr: OpenAI has followed their usual (and much appreciated) path of cutting the pricing of the Realtime API with every release. Cost is now about $0.04/minute of speech-to-speech time, factoring in the implicit token caching.
But note: you generally do not get charged for non-talking time because the OpenAI voice activity detection filters out non-speech input. So for a use case like voice programming, you're probably only talking 5% of the time and dictation assistant output is extremely brief, so it's maybe $0.20/hour to stay connected to the Realtime API all the time while you're programming. (ymmv)
Cost calculation spreadsheets below ...
Voice-only programming with the new OpenAI Realtime API ...
I spend a lot of time these days pair programming with LLMs. Often I'm talking rather than typing.
This "voice dictation" use case has become an important vibe benchmark for me. Being able to create text input just by talking, flexibly, in a context dependent way, with tool calling, is a *hard* problem for today's models.
Natural language dictation requires a very high degree of contextual intelligence, instruction following accuracy, and tool calling reliability.
Today's new gpt-realtime model is quite good at this hard problem.
The original realtime model release last year was impressive. Seeing what a speech-to-speech model could do got a lot of people excited about the possibilities of voice AI. The improvements since that first release are equally impressive. I can use this new model, now, for real world tasks that were past the edge of the "jagged frontier" before.
Here's a video showing a couple of fun (and tricky) modes of voice input.

Realtime API cost calculator[1]
Generic voice agent cost calculator spreadsheet:
[2]
Interactive cost calculator by @anarchyco[3]