August 8, 2025
Quick PSA. Settings for minimizing GPT-5 latency (time to first token).
"service_tier": "priority", "reasoning_effort": "minimal", "verbosity": "low".
P50 TTFT with these settings is ~750ms. With the defaults, it's >3s.
The default settings are the right starting point for most use cases. It's *good* that this model can think proactively. As @swyx says, "It's a good model, ser."
For use cases where you care a lot about TTFT, use the above settings.
Posting this here because I've answered this question a bunch of times today in various DMs and channels.