On the persistent issue of factuality hallucinations in Large Language Models (LLMs), a LinkedIn post by Maxime Labonne gave as an example the “Indigo Sock Game” - a non-existent game that, according to him, most models will nonetheless confidently describe when prompted. This phenomenon underscores the ongoing challenges in ensuring LLM reliabi... Read more 23 Aug 2024 - 1 minute read
Recent discussions around the Aidan Bench (https://github.com/aidanmclaughlin/Aidan-Bench) have highlighted the significant impact of temperature settings and sampling methods on benchmark results for large language models (LLMs). Sam Paech’s experiments (https://x.com/sam_paech/status/1823295200724398244) with the GPT-4o-mini model demonstrate... Read more 13 Aug 2024 - less than 1 minute read
A commenter shared previously that deep-fake methods struggle with forging faces from the side (cheeks, ears). Not anymore: In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. […] we propose a framework comprising prior learning... Read more 13 Aug 2024 - less than 1 minute read
Headline at Wired: “Microsoft’s AI Can Be Turned Into an Automated Phishing Machine”. It is/was even worse than just phishing: someone retrieved a confidential document (signed with Docusign) from a public Copilot: post on X. This is a nice case of OpenAI Miles’ distinction of what gets deployed: it’s not primarily the AI Foundation Model that’... Read more 10 Aug 2024 - less than 1 minute read
Simon Willison, creator of Datasette and co-creator of Django, recently asked on Twitter for a “vibe check” on Llama 3.1 405B. He was particularly interested in whether it’s becoming a credible self-hosted alternative to the best OpenAI or Anthropic models, and if any companies previously hesitant about sending data to API providers are now usin... Read more 30 Jul 2024 - 1 minute read