Upfront summary: Codex CLI is labelled an “experimental project” and it certainly is: handing off whole, convoluted developments tasks for automagic completion has still not arrived.
- use is charged towards API credits
- free use allowance can be redeemed by users of certain paid ChatGPT subscriptions or through data sharing
- the ChatGPT subscription needs to be active for more than 7 days, though
- free use allowance can be redeemed by users of certain paid ChatGPT subscriptions or through data sharing
- requests/responses get logged to the OpenAI Platform Logs that are accessable through the Dashboard
- Codex CLI defaults to the new “codex-mini-latest” model
- codex-mini-latest is:
- also available via the API, but not offered through the Playground UI
- slightly costlier than o4-mini on a per-token basis, could be different in real use though
- the model used by Codex can be changed through the “/model” command
- o3 and o4-mini are run with “high” reasoning effort according to the logs, and are thus particularly expensive to run
- codex-mini-latest is:
- Supports many LLM providers, but neither Amazon Bedrock nor Anthropic Claude
- a rewrite of Codex CLI in Rust has been announced
- so forking & adding Bedrock support may not the best investment, currently
- local ressource access is facilitated by the
local_shell
hosted tool- exclusive to codex-mini-latest, at least through the API
- does not access the web, so tends to make up APIs that don’t exist
- Eleanore Berger’s approach with AI-assisted planning upfront seems like a good workaround
- general lore that coding agents can get costly quickly seems true, at least with the o3 model: a request to update a Github Pages blog post cost > $4.
- insight: all interactions and shell input/output are inlined in the main session are thus carried along in the context
- I had anticipated for implicit caching to alleviate the cost of growing context, but apparently that’s not as much as to make the use of o3 in Codex reasonably cheap
- generally lazy vibe for updating > 100 blog posts, with poor instruction following: models like to read files only partially (read first n lines, summarize first paragraph, …) and generally had to be instructed that it’s an LLM itself with all capabilities. o4-mini seemed best, but went off-rails frequently by claiming “too manually intensive”, asking for feedback, etc. Once broke off and unilaterally created several scripts that would call out to an external LLM!
- Codex CLI, like Google Jules, was overwhelmed with undoing an historical API switch and implementing a new feature based on the new-old API while keeping everything else intact.
- a simple coding task on an existing code base with sufficient detail given as context appeared successful, but I did not verify in depth