Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

OpenAI GPT-OSS open weights model released

Intro

OpenAI has released the open-weights model that was announced end of March. It is a reasoning model that comes in two sizes: 20B and 120B, in a Mixture-of-Experts configuration. Model Card. Commenters call it “similar to o3” in performance, but that’s seems untrue - it’s about o3-mini grade, but lags behind o4-mini.

Providers

  • for quick testing, there’s gpt-oss.com. This does not allow one to specify a System/Developer prompt, though
  • Third party “Inference Providers”
    • beware: accuracy can be inconsistent! Romain Huet, Head of Developer Experience at OpenAI: “Heads-up for developers trying gpt-oss: performance and correctness can vary a bit across providers and runtimes right now due to implementation differences. We’re working with inference providers to make sure gpt-oss performs at its best everywhere, and we’d love your feedback!”. He adds: “We’ve verified the vLLM implementation so there should be no issues with that. If you run into issues let us know.”
    • OpenRouter has support
    • Awesome GPT-OSS lists Cloudflare and Groq
    • Support on Azure AI Foundry and AWS Bedrock announcements make it appear it would be available as of now, but neither actually does.
  • Local: the Github repo lists several options - including vLLM, HuggingFace Transformers and Ollama.
    • Transformers doesn’t seem viable on Apple Silicon as of now, though (see below)

Gotchas on macOS

For reference, running the Transformers serve command as given in the Inference examples straight fails on macOS:

% transformers serve

ImportError: Missing dependencies for the serving CLI. Please install with `pip install transformers[serving]`

… which in turn fails with:

% pip install transformers[serving]

zsh: no matches found: transformers[serving]

The trick here is to quote the package name:

% pip install "transformers[serving]"

Ultimately though, the MPS backend for Apple Silicon doesn’t seem to be supported: transformers serve will start up fine and transformers chat will make a connection, but the server will fail at runtime in the background:

  File "/Users/ndurner/gpt-oss/.venv/lib/python3.13/site-packages/transformers/quantizers/quantizer_mxfp4.py", line 60, in validate_environment
    raise RuntimeError("Using MXFP4 quantized models requires a GPU")
RuntimeError: Using MXFP4 quantized models requires a GPU

As the OpenAI Cookbook on gpt-oss with Ollama explicitely notes:

Perfect for higher-end consumer GPUs or Apple Silicon Macs … this seems to be the way.

Insights

The open-sourced stack released along with the models gives some additional insights. I have noticed before that the o1 model likes to be addressed as “ChatGPT” in prompts, not “o1”. This is confirmed by the “You are ChatGPT” identity prompt published as part of the Harmony Prompt template harness (Cookbook, Github repo).