Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

LLM Chat Conversion Tool

With ChatGPT offering features not found in the API, like o1-pro previously or o3 with integrated web search, users may want to switch back and forth between ChatGPT, the Prompts Playground, and perhaps archive to Markdown. Enter Chatbot Conversation Converter: A Python utility that converts chat conversations between different formats, inclu... Read more

EMBER: Epistemic Markers as a Stress‑Test for LLM‑based Evaluation

In my note on LLM as a judge, I pointed out a study where GPT‑4 (0613) aligned well with human ratings. A new paper – “Are LLM‑Judges Robust to Expressions of Uncertainty?” – asks what happens once those answers include explicit markers of certainty or doubt. The authors provide “EMBER”, a benchmark that patches existing QA and instruction‑follo... Read more

[UPDATED] Grounding ChatGPT etc by disabling Web Search

One problem with current AI Assistants including ChatGPT and Microsoft Copilot Chat is that they are chronically online: they are not a more or less pure LLM experience anymore, but search the web through tool-use. While this helps to keep answers up-to-date beyond the LLM training cut-off date, there’s a disadvantage in that the UIs don’t allow... Read more

Grok 3 Mini Pricing

Artificial Analysis, an “Independent analysis of AI models and hosting providers” outlet, have published a chart plotting “Intelligence Index vs. Price” (X post). This places Grok 3 Mini Reasoning at the upper left corner, inside the “Most attractive quadrant”. According to this, it has the best intelligence/price ratio - even at High-Reasoning ... Read more

[UPDATED] OpenAI Web Search, o3 API

In addition to the newly released OpenAI models, I have added Web Search to my LLM frontend. This allows up-to-date information to be worked with: Prompt: when is the new German chancellor going to be sworn in? Response: Friedrich Merz is scheduled to be elected as Germany’s new Chancellor on May 6, 2025. (reuters.com) […] Source references... Read more