Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

OpenAI GPT-5 release

Introduction

OpenAI has released the much anticipated GPT-5 model. System card. It comes in four flavors: nano, mini, chat, and gpt-5 “proper”. Some of these were beta-tested through the OpenRouter platform under the disguise of “Horizon Alpha” and “Horizon Beta”. The Using GPT-5 document details features and includes a migration guide from previous models.

Horizon Alpha/Beta

The models showed two particularities in my testing:

  • gender bias like GPT-4.1: When asking to assign names to term as highlighted in the Stanford AI Index, Horizon Alpha and Beta both showed this bias - which I haven’t seen with GPT-4o. GPT-5 with minimal reasoning effort (see below) shows this bias, with medium effort it does the alternation.
  • thinking, deliberating on the solution within the comments of source code they produced - leading to a lot of non-helpful clutter in the comments

OpenRouter confirms that both models were early checkpoints of the GPT-5 family.

Reasoning switch

As established for other reasoning models, the length of the reasoning process can be somewhat shortened. GPT-5 makes this explicit, by extending the reasoning effort parameter to include “minimal”. In addition, the verbosity level is now configurable, allowing outputs aside from the reasoning process to be more terse:

When generating code, medium and high verbosity levels yield longer, more structured code with inline explanations, while low verbosity produces shorter, more concise code with minimal commentary.

(quote from Using GPT-5).

To use GPT-5 without any reasoning in the API, the Introducing GPT-5 for developers document recommends:

The non-reasoning model used in ChatGPT is available as gpt-5-chat-latest.

So the “gpt-5-chat” model in the API seems to be what’s called “gpt-5-main” in the System Card? In contrast to the other models in the API, this one is not versioned - from the Model Card: “GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT.”

API vs ChatGPT

Some of the things presented on the live stream are not available in the API:

  • GPT-5 Pro: the model gpt-5-thinking-pro, as the migration path from o3-pro, is exclusive to ChatGPT Pro: “In ChatGPT, we also provide access to gpt-5-thinking using a setting that makes use of parallel test time compute; we refer to this as gpt-5-thinking-pro.” (quote from the System Card)
  • the improved audio input/output

The API gives explicit control over which model to use, however. In ChatGPT, a model router takes control of that, and users may have to signal which route to take (“think hard about this”). Per the System Card, the model router will continue to be trained “on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time”. This means that ChatGPT users will see inconsistent results that change over time.

Image generation in either will continue be provided by gpt-image-1 (which was recently improved by the addition of a “High” input fidelity option).

ChatGPT users continue to be downgraded, depending on the suscription they have bought:

Once usage limits are reached, a mini version of each model handles remaining queries.

(quoted from the System Card). During the live stream announcement, it was clarified that ChatGPT Pro users receive unlimited use of GPT-5.

For developers

The announcement live stream included several demoes of using GPT-5 for coding. GPT-5 improves strongly on a number of coding-related benchmarks, including Aider Polyglot. Windsurf and Cursor provide free use of GPT-5 for a limited time. OpenAI’s own Codex CLI now uses GPT-5 by default, with usage covered by a ChatGPT subscription. There’s a “curated collection of demo applications generated entirely in a single GPT-5 prompt, without writing any code by hand”: GPT-5 coding examples: Github repository.

Pricing

Simon Willison has a nice comparison table, noting that GPT-5 is cheaper-per-token than GPT-4o (and 4.1; but costlier than o4-mini, especially on outputs). (As always, output lengths vary in terms of the amount of tokens used as this is not a standard unit of measure). Cost savings through implicit inputs caching are substantial: about 1/10 the regular cost (90% off), i.e., $0.125 vs $1.25 per 1M tokens with GPT-5 “proper”.

Availability

GPT-5 is currently rolling out to paid ChatGPT plans including Team and Pro, but the rollout is not complete as of yet (2025-08-07 13:52 PDT). Availability on the API is granted to all Tiers, with full support on the OpenAI Prompts Playground and basic support through my OAI Chat. It’s also rolling out to Microsoft platforms, including Microsoft 365 Copilot, Copilot, GitHub Copilot, and Azure AI Foundry.

Open questions

  • What’s the time-to-first-token, given that there is no true no-thinking model or mode in the API? Does it suffer?

This post will be updated