Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

PDFs, LLMs, and 'The Bitter Lesson' in Document AI

In a recent LinkedIn discussion around a review of PDF-focused KI-Assistants (ComputerBase article), I touched on limitations, workarounds, and the ongoing tension between domain-specific tooling and general-purpose, ever-evolving AI platforms. The Article in Brief The ComputerBase article put Adobe’s new KI-Assistant, Google’s NotebookLM, and... Read more

LLM Chat Conversion Tool

With ChatGPT offering features not found in the API, like o1-pro previously or o3 with integrated web search, users may want to switch back and forth between ChatGPT, the Prompts Playground, and perhaps archive to Markdown. Enter Chatbot Conversation Converter: A Python utility that converts chat conversations between different formats, inclu... Read more

EMBER: Epistemic Markers as a Stress‑Test for LLM‑based Evaluation

In my note on LLM as a judge, I pointed out a study where GPT‑4 (0613) aligned well with human ratings. A new paper – “Are LLM‑Judges Robust to Expressions of Uncertainty?” – asks what happens once those answers include explicit markers of certainty or doubt. The authors provide “EMBER”, a benchmark that patches existing QA and instruction‑follo... Read more

[UPDATED] Grounding ChatGPT etc by disabling Web Search

One problem with current AI Assistants including ChatGPT and Microsoft Copilot Chat is that they are chronically online: they are not a more or less pure LLM experience anymore, but search the web through tool-use. While this helps to keep answers up-to-date beyond the LLM training cut-off date, there’s a disadvantage in that the UIs don’t allow... Read more

Grok 3 Mini Pricing

Artificial Analysis, an “Independent analysis of AI models and hosting providers” outlet, have published a chart plotting “Intelligence Index vs. Price” (X post). This places Grok 3 Mini Reasoning at the upper left corner, inside the “Most attractive quadrant”. According to this, it has the best intelligence/price ratio - even at High-Reasoning ... Read more