OpenAI announced new automatic caching for Realtime API tokens, and an 80%…

October 31, 2024

OpenAI announced new automatic caching for Realtime API tokens, and an 80% price reduction for cached audio tokens.

I updated our cost calculator spreadsheet, both to reflect the new cost reduction from caching and to revisit some assumptions about typical talk/turn times based…

Here's the link. You can copy this sheet and edit it: https://t.co/UubNGDp8j5

A quick recap, in case you missed the thread from earlier today that @altryne started.

https://t.co/RLnKrFs0R0

The realtime API keeps its own internal copy of the entire conversational history for a session.

For each "turn" in the conversation the entire history, plus the newest user input, is used for inference. (You can see this in the usage metrics that the API reports.)

- Total input audio for a typical 1-minute conversation (with three conversational turns and 30% non-speech time) will be just over a minute.
- Total input audio for a typical 5-minute conversation will be ~26 minutes.
- Total input audio for a typical 10-minute conversation will be ~100 minutes!

Audio uses a lot of tokens — about 800 per minute of speech. So, without caching, costs add up a lot for long conversations.

The new automatic token caching and 80% price drop for cached audio tokens is a big, big cost savings for long conversations. ~60% for a 5-minute conversation. ~70% for a 10-minute conversation.