ChatGPT Agent, an agentic environment within ChatGPT, has been rolled out to paid subscribers in the EEA and Switzerland (excluding Enterprise and Edu users). This combines the OpenAI Operator technology preview with Deep Research, a terminal tool for code execution and external data sources via Connectors. Operator will be deprecated in the coming weeks, as stated in the OpenAI Knowledge Base article. According to the System Card, ChatGPT Agent is “a new agentic model in the same family as OpenAI o3”. Notably, the general-purpose o3 model scores better on some benchmarks, including Hallucination (SimpleQA) and Fairness. Thus, ChatGPT Agent has its advantages but may not be universally superior and is considerably slower (see below).
Under the Hood
Browser-use within Agent Mode functions by capturing screenshots and allowing o3 to determine which computer actions to perform. This is similar to my Aileen prototype and Anthropic’s Computer-Use, yet differs from the open-source project Browser-use and Amazon ReAct. Additionally, the capabilities via the OpenAI API seem limited, as the Computer-Use model apparently hasn’t been upgraded from the 4o model to o3. The Deep Research models remain distinct, with only a limited Web Search tool. ChatGPT won’t “see” user inputs during browser control, safeguarding sensitive information such as passwords. Upon regaining control, the model first needs to interpret the current browser state.
Experiments
Delayed Baggage Reimbursement
I replicated the AI-first example use-case of filing an expense claim for delayed baggage. Instead of manually seeking and pasting required information from multiple documents, I:
- Uploaded PDF documents and an Apple Music receipt (for additional personal data) to ChatGPT.
- Activated the “Agent” mode from the ChatGPT tools:
- Kept the model as 4o instead of switching to o3, as it does not affect this function.
- Prompted: “Prepare the SWISS form for cost reimbursement due to delayed baggage, based on the information in the files attached. (Use the Apple Music receipt only to get additional personal data about myself. This wasn’t an actual expense to be claimed.)”
The agent initiated by setting up its virtual desktop environment:
and reviewed the provided materials:
The o3 model accurately identified that the mineral water marked as crossed out should be excluded from the claim, arriving at the correct total.
Next, it searched the web for the claims form. Unlike OpenAI Operator, which prefers Bing’s first result, ChatGPT Agent Mode uses its integrated search:
After dismissing overlays like cookie consent and a chatbot, it started work on the form:
(pink redactions mine)
File uploads were performed automatically, correctly excluding the Apple Music receipt as requested:
Two issues arose: upon regaining control to review and subsequently closing the form, ChatGPT misinterpreted this as explicit consent and automatically submitted the form:
. Also, my Miles & More frequent flier number was not included in the form as this was not evident in the materials.
SWISS confirmed successful submission. Overall, ChatGPT completed the task within 15 minutes.
Web Research
Eleanor Berger reported a failed research task involving creating an overview of a specific restaurant type (Prompt). A simpler research task previously attempted with Operator - “Where can I buy QES?” - yielded relevant results efficiently, unlike Operator’s earlier tangential outcomes. OpenAI’s proprietary search integration seems to have improved performance. Interestingly, ChatGPT internally phrased searches explicitly, such as Where to buy qualified electronic signature Germany
. The inclusion of the “Germany” context could result from ChatGPT injecting user background information. Searches included:
- “QES” product buy
- “trust service provider” qualified electronic signature buy
- Where to buy qualified electronic signature Germany
- QES InfoCert qualified electronic signature purchase
These queries returned diverse results, including knowledge-base articles and YouTube videos.
(Robert MacCloy, who specializes in AI-driven search, said on the Latent Space podcast episode “AI Eats Search” that websites don’t get any insight into the underlying search query that triggered that retrieval. I wonder if display of these queries may actually provide some insights.)
Web Research with PowerPoint Presentation
Extending the QES web research into a PowerPoint presentation:
ChatGPT clarified the meaning of “QES” before spending 35 minutes generating the presentation, including around 10 minutes spent fixing JavaScript code that relies on pptxgenjs:
Agent Mode utilized stock images, font-based icons, and custom images created via ImageGen:
Final step: Agent Mode automatically reviewing each slide individually:
After completion, the slide deck was ready for human review and download:
PowerPoint translation
Translating a slide deck (in PPTX format) from English to German mostly worked, with inline graphics and bullet points left intact:
Linguistically, Agent Mode produces are smoother result - particularly where German and English were mixed in the original.
Drawback however: inline links were discarded by Agent Mode:
Upon inspection, the problem of Agent Mode discarding inline links from presentations seems fundamental: behind the scenes, the slides are worked on using ad-hoc generated Python scripts, and the library used discards any text formatting of text fragments (so-called “runs”) it is working on.
Takeaways
- OpenAI provides guidelines on data safety and privacy risk mitigation.
- “Reading Mode” for websites might utilize Linux tools like curl and sed - status messages indicated as much. SEO professionals might consider this in optimization efforts.
- “Page not found” errors at various sites could indicate stale indexes, confabulation, or issues with client-agent detection — a potential area for SEO investigation.
- User reports indicate inconsistent results; outcomes may vary significantly.
- Failure modes extend beyond Hallucinations as ChatGPT Agent modes “inherits” limitations of Open Source software packages - as experienced with the discarded links.
[Update 2025-08-01] Added translation example