Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

RAG in practice

Studies by Salesforce Research and Google Deepmind, as well as own experiments, have previously cast fundamental doubt on RAG. Now, Richard Meng came forward and shared practical confirmation: We’ve spoken with 30 companies who developed RAG-based chatbots on PDF documents. Every single one has failed The problems he shares are familiar: ... Read more

Sharing OpenAI Deep Research

Sharing results from OpenAI Deep Research is not straightforward as simple copy & paste will cobble up attributions. What works instead is lifting the HTML fragment from ChatGPT and building a document from there. Step-by-step: In the browser version of ChatGPT, target the title in the response report. Do a left-click, open the Developer ... Read more

OpenAI Sales Associate and Deep Research demo

A “Virtual Sales Associate” was showcased briefly by OpenAI, perhaps as something like GPTs: “Enterprises can start to customize these applications [agents?]”. Also, OpenAI may be going after the data inside organizations: “The most valuable data is the data you have in your company. You can imagine us to expand Deep Research to internal informa... Read more

OpenAI Deep Research

To continue the trend of “long-running AI”, OpenAI have launched “Deep Research” (not in the EU, though): Deep research is OpenAI’s next agent that can do work for you independently—you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analys... Read more

OpenAI o3-mini

OpenAI have released o3-mini. It’s the first model that scores below 1% on the Vectara Hallucination Benchmark. This is in stark contrast to DeepSeek R1, which is off their chart at 14.3% hallucination and underperforming small language models like Amazon Titan Express. o3-mini comes in three strenghes, determined by the “reasoning effort” (dubb... Read more