A recent paper titled “Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?” has garnered some attention through German tech media. The study claims to demonstrate fundamental limitations in visual language models’ (VLMs) reasoning capabilities. However, upon closer examination of their methodology and code implementation, several is... Read more 02 Nov 2024 - 2 minute read
Motivated by starkly different results from different Llama 3.1 405B providers on one hand, and claims - particulary derived from the Chatbot Arena that quantized versions are no different on the other hand, I have been wishing for a telltale sign that 1) conclusively proves otherwise and 2) tells providers apart. Good news: Simon Willison has s... Read more 26 Oct 2024 - less than 1 minute read
Two recent studies provide insights into the adoption and perception of artificial intelligence in Germany. The Bitkom study on “AI Usage in Germany” and Deutsche Telekom’s international YouGov survey paint a nuanced picture that challenges some common assumptions. Surprising Findings from the Bitkom Study The Bitkom study, presented at the Di... Read more 26 Oct 2024 - 1 minute read
Anthropic have published the “Computer Use Demo” in their Quickstarts Github repository. The approach taken is fundametally different from my Aileen project: it’s not confined to a browser controlled through Selenium and very tight guardrails, but instead controls a full GNU/Linux desktop - which is separate from the user desktop session. On the... Read more 22 Oct 2024 - 3 minute read
A recent paper from Apple about reasoning deficits has been widely reposted as “LLMs Can’t Reason”. The study claims to demonstrate significant limitations in the reasoning capabilities of large language models (LLMs). Gary Marcus, author of a “Forbes 7 Must Read Books in AI”, railed: “There is just no way can you build reliable agents on this f... Read more 20 Oct 2024 - 3 minute read