Chinese AI Lab DeepSeek have published R1, a model that sits in the category of “reasoning models” - like OpenAI o1. The original R1 huge in size - 671B parameters. But there are also smaller, distilled versions available. Simon Willison managed to get some of these smaller ones running on his MacBook Pro (threat). Using the full version that is served up through chat.deepseek.com, I could run Simon’s pelican benchmark - and the first result is already impressive by comparison.
[Update 2025-01-23] OpenRouter have added inference from providers in the US. Nice from a data protection standpoint, but more expensive than DeepSeek themselves.
[Update 2025-01-24] An upcoming article of mine will detail some real-world tasks from my AI-first practice. R1 is successful on the first, easiest task (all of the Frontier Models are) but fails on the hardest task. All traditional models crack this hardest task with clever prompting, and so does o1 in Pro mode. Interestingly, R1 seems to get the task right in its reasoning steps, but fails to synthesize the output correctly - it’s utterly unusable. Also, when used via OpenRouter + Together AI or Fireworks, R1 is not cheaper - or even more expensive - than traditional models with clever prompting.
[Update 2025-01-27] Several users on X highlight that the [DeepSeek privacy policy] reserves the right to train on user inputs:
Review, improve, and develop the Service, including […] training and improving our technology.
It’s worth mentioning that the contractual parties per the DeepSeek Terms of Use are two Limited Liability companies in China. This takes effect when using the aforementioned chat.deepsek.com service, not when using the US providers, e.g. via OpenRouter.
[Update 2025-01-27]
DeepSeek has gone viral, and false information is abound. Helen Toner has curated a good thread, Philipp Schmidt dispels some of the myths. Stock markets are down over - what I think are irrational - fears: several users on X point out Jevons paradox: cheaper goods entail more use (Ethan Mollick).
Meanwhile, R1 providers are struggling with demand, with some users reporting them to be unusable - including DeepSeek. People look at the smaller derivates, but Ethen Mollick’s verdict is: “it’s kind of depressing”. Interestingly, he’s seeing the same behaviour as I did with my still private test set for the upcoming article:
You get the reasoning, but the model is not smart enough to actually use it
[Update 2025-01-28] German association LAION re-reran their “Alice in Wonderland” study. Verdict: “We see that DeepSeek R1 is fragile, far from claimed match to o1-preview, while matching o1-mini clearly outperforming non-reasoning LLMs.”. Looking at their results, one traditional LLM that is not clearly outperformed is Claude 3.5 Sonnet.
[Update 2025-01-30] Two new options outside China:
- HuggingFace Pro users get a $2 allowance through Inference Partners.
- Microsoft Azure AI Foundry, currently free of cost? Only provided from US datacenter regions, though. Available via Github also. (Blog).
It’s a little awkward to use on the Playground as the reasoning monolog before the actual answer is not separated