Quick PSA. Settings for minimizing GPT-5 latency (time to first token)

August 8, 2025

Quick PSA. Settings for minimizing GPT-5 latency (time to first token).

"service_tier": "priority", "reasoning_effort": "minimal", "verbosity": "low".

P50 TTFT with these settings is ~750ms. With the defaults, it's >3s.

The default settings are the right starting point for most use cases. It's *good* that this model can think proactively. As @swyx says, "It's a good model, ser."

For use cases where you care a lot about TTFT, use the above settings.

Posting this here because I've answered this question a bunch of times today in various DMs and channels.

^[1]

https://gist.github.com/kwindla/874a880e444073aa3c1ad84801d8842e ↩