Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

Mitigating LLM Hallucinations: The Power of System Prompts

On the persistent issue of factuality hallucinations in Large Language Models (LLMs), a LinkedIn post by Maxime Labonne gave as an example the “Indigo Sock Game” - a non-existent game that, according to him, most models will nonetheless confidently describe when prompted. This phenomenon underscores the ongoing challenges in ensuring LLM reliabi... Read more

LLM Benchmarks: The Impact of Temperature and Sampling

Recent discussions around the Aidan Bench (https://github.com/aidanmclaughlin/Aidan-Bench) have highlighted the significant impact of temperature settings and sampling methods on benchmark results for large language models (LLMs). Sam Paech’s experiments (https://x.com/sam_paech/status/1823295200724398244) with the GPT-4o-mini model demonstrate... Read more

3D Animatable Head Avatars

A commenter shared previously that deep-fake methods struggle with forging faces from the side (cheeks, ears). Not anymore: In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. […] we propose a framework comprising prior learning... Read more

Microsoft AI: Phishing Machine

Headline at Wired: “Microsoft’s AI Can Be Turned Into an Automated Phishing Machine”. It is/was even worse than just phishing: someone retrieved a confidential document (signed with Docusign) from a public Copilot: post on X. This is a nice case of OpenAI Miles’ distinction of what gets deployed: it’s not primarily the AI Foundation Model that’... Read more

Llama 3.1 405B: Quantization and Hosting Challenges

Simon Willison, creator of Datasette and co-creator of Django, recently asked on Twitter for a “vibe check” on Llama 3.1 405B. He was particularly interested in whether it’s becoming a credible self-hosted alternative to the best OpenAI or Anthropic models, and if any companies previously hesitant about sending data to API providers are now usin... Read more