Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

OpenAI o3-mini

OpenAI have released o3-mini. It’s the first model that scores below 1% on the Vectara Hallucination Benchmark. This is in stark contrast to DeepSeek R1, which is off their chart at 14.3% hallucination and underperforming small language models like Amazon Titan Express. o3-mini comes in three strenghes, determined by the “reasoning effort” (dubb... Read more

Behind the Nvidia stock crash

When stock in Nvidia crashed an unprecented 18% allegedly because DeepSeek allegedly having used less capable chips, many were confused. Some pointed at the Jevons paradox which describes that efficiency gains cause an increase in use to a degree where the gains evaporate. But as X user Alexander Doria was quick to point out: DeepSeek has tra... Read more

OpenAI's Operator: AI-Assisted Web Navigation

OpenAI has launched “Operator”, a new mode within ChatGPT that aims to navigate web-based workflows as a human would. This development is part of a broader trend in AI-assisted task completion and represents a significant step forward in the practical application of language models. Key Points Availability: Currently limited to US-based Cha... Read more

[UPDATED] DeepSeek R1 Reasoning Model

Chinese AI Lab DeepSeek have published R1, a model that sits in the category of “reasoning models” - like OpenAI o1. The original R1 huge in size - 671B parameters. But there are also smaller, distilled versions available. Simon Willison managed to get some of these smaller ones running on his MacBook Pro (threat). Using the full version that is... Read more

Microsoft Copilot Chat

Microsoft has launched Copilot Chat, an offering similar to ChatGPT: Blog post: https://aka.ms/CopilotChat LinkedIn: Microsoft, CMO Web version: https://copilot.cloud.microsoft/ WhatsApp, Telegram: Copilot for Social Apps It seems generally free to use. There are some advanced features¹ on a pay-per-use basis, and a $... Read more