OpenAI Deep Research

To continue the trend of “long-running AI”, OpenAI have launched “Deep Research” (not in the EU, though): Deep research is OpenAI’s next agent that can do work for you independently—you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analys... Read more 03 Feb 2025 - less than 1 minute read

OpenAI o3-mini

OpenAI have released o3-mini. It’s the first model that scores below 1% on the Vectara Hallucination Benchmark. This is in stark contrast to DeepSeek R1, which is off their chart at 14.3% hallucination and underperforming small language models like Amazon Titan Express. o3-mini comes in three strenghes, determined by the “reasoning effort” (dubb... Read more 02 Feb 2025 - less than 1 minute read

Behind the Nvidia stock crash

When stock in Nvidia crashed an unprecented 18% allegedly because DeepSeek allegedly having used less capable chips, many were confused. Some pointed at the Jevons paradox which describes that efficiency gains cause an increase in use to a degree where the gains evaporate. But as X user Alexander Doria was quick to point out: DeepSeek has tra... Read more 02 Feb 2025 - 1 minute read

OpenAI's Operator: AI-Assisted Web Navigation

OpenAI has launched “Operator”, a new mode within ChatGPT that aims to navigate web-based workflows as a human would. This development is part of a broader trend in AI-assisted task completion and represents a significant step forward in the practical application of language models. Key Points Availability: Currently limited to US-based Cha... Read more 24 Jan 2025 - 3 minute read

[UPDATED] DeepSeek R1 Reasoning Model

Chinese AI Lab DeepSeek have published R1, a model that sits in the category of “reasoning models” - like OpenAI o1. The original R1 huge in size - 671B parameters. But there are also smaller, distilled versions available. Simon Willison managed to get some of these smaller ones running on his MacBook Pro (threat). Using the full version that is... Read more 30 Jan 2025 (Updated) - 2 minute read

Older Newer