Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

OpenAI Deep Research

To continue the trend of “long-running AI”, OpenAI have launched “Deep Research” (not in the EU, though): Deep research is OpenAI’s next agent that can do work for you independently—you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analys... Read more

OpenAI o3-mini

OpenAI have released o3-mini. It’s the first model that scores below 1% on the Vectara Hallucination Benchmark. This is in stark contrast to DeepSeek R1, which is off their chart at 14.3% hallucination and underperforming small language models like Amazon Titan Express. o3-mini comes in three strenghes, determined by the “reasoning effort” (dubb... Read more

Behind the Nvidia stock crash

When stock in Nvidia crashed an unprecented 18% allegedly because DeepSeek allegedly having used less capable chips, many were confused. Some pointed at the Jevons paradox which describes that efficiency gains cause an increase in use to a degree where the gains evaporate. But as X user Alexander Doria was quick to point out: DeepSeek has tra... Read more

OpenAI's Operator: AI-Assisted Web Navigation

OpenAI has launched “Operator”, a new mode within ChatGPT that aims to navigate web-based workflows as a human would. This development is part of a broader trend in AI-assisted task completion and represents a significant step forward in the practical application of language models. Key Points Availability: Currently limited to US-based Cha... Read more

[UPDATED] DeepSeek R1 Reasoning Model

Chinese AI Lab DeepSeek have published R1, a model that sits in the category of “reasoning models” - like OpenAI o1. The original R1 huge in size - 671B parameters. But there are also smaller, distilled versions available. Simon Willison managed to get some of these smaller ones running on his MacBook Pro (threat). Using the full version that is... Read more