Notes on OpenAI Codex CLI

Upfront summary: Codex CLI is labelled an “experimental project” and it certainly is: handing off whole, convoluted developments tasks for automagic completion has still not arrived.

use is charged towards API credits
- free use allowance can be redeemed by users of certain paid ChatGPT subscriptions or through data sharing
  - the ChatGPT subscription needs to be active for more than 7 days, though
requests/responses get logged to the OpenAI Platform Logs that are accessable through the Dashboard
Codex CLI defaults to the new “codex-mini-latest” model
- codex-mini-latest is:
  - also available via the API, but not offered through the Playground UI
  - slightly costlier than o4-mini on a per-token basis, could be different in real use though
- the model used by Codex can be changed through the “/model” command
  - o3 and o4-mini are run with “high” reasoning effort according to the logs, and are thus particularly expensive to run
Supports many LLM providers, but neither Amazon Bedrock nor Anthropic Claude
a rewrite of Codex CLI in Rust has been announced
- so forking & adding Bedrock support may not the best investment, currently
local ressource access is facilitated by the local_shell hosted tool
- exclusive to codex-mini-latest, at least through the API
does not access the web, so tends to make up APIs that don’t exist
- Eleanore Berger’s approach with AI-assisted planning upfront seems like a good workaround
general lore that coding agents can get costly quickly seems true, at least with the o3 model: a request to update a Github Pages blog post cost > $4.
- insight: all interactions and shell input/output are inlined in the main session are thus carried along in the context
- I had anticipated for implicit caching to alleviate the cost of growing context, but apparently that’s not as much as to make the use of o3 in Codex reasonably cheap
generally lazy vibe for updating > 100 blog posts, with poor instruction following: models like to read files only partially (read first n lines, summarize first paragraph, …) and generally had to be instructed that it’s an LLM itself with all capabilities. o4-mini seemed best, but went off-rails frequently by claiming “too manually intensive”, asking for feedback, etc. Once broke off and unilaterally created several scripts that would call out to an external LLM!
Codex CLI, like Google Jules, was overwhelmed with undoing an historical API switch and implementing a new feature based on the new-old API while keeping everything else intact.
a simple coding task on an existing code base with sufficient detail given as context appeared successful, but I did not verify in depth