Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

Italian LLM Benchmark: INVALSI for AI

The University of Milano-Bicocca has published a significant work for Generative AI in Italy. As Alessandro Vitale notes in his LinkedIn post, there was previously no benchmark to understand how well LLMs performed in Italian. The new benchmark adapts INVALSI tests, which are typically given to Italian students in elementary, middle, and high sc... Read more...

LivePortrait: Animating the Static

A video that’s currently captivating my social media timeline demonstrates a fascinating leap in AI-driven animation. Developed by Chinese research groups, this demo represents a significant milestone in what I’d love to see in a “Generative AI” product or service. Read more...

Improving LLM User Interfaces: The Case for Conversation Forking

Maxime Labonne, Staff Machine Learning Scientist at Liquid AI, recently posited that while the models themselves have made significant progress, user interfaces haven’t kept pace. Read more...

LLM Pricing Comparisons: The Missing Tokenizer Efficiency Factor

Recently, Philipp Schmid shared an interactive LLM pricing comparison tool hosted on Hugging Face. This tool allows users to filter providers and models, comparing them side-by-side. It’s an impressive effort that includes a wide range of providers such as Fireworks AI, Groq, Replicate, and IBM. Read more...

IBM Granite Models - for Agents?

Recently, I came across a LinkedIn post by Armand Ruiz, VP of Product - AI Platform at IBM, discussing the differences between chatbots, copilots, and agents. While the post provided a general overview of these AI categories, it prompted me to inquire about a more specific topic: IBM’s Granite models. Read more...