description: “Details the Clash evaluation framework for LLM comparison, describing methodology, scoring metrics, and case study results across multiple models.” layout: post title: “ClashEval: When LLM Safeguards Clash with RAG” date: 2024-05-20 last_updated: 2024-05-20 tags: [llm, rag, misinformation, ai-safety, aleph alpha] — A recent paper ... Read more (Updated) - 1 minute read
A recent LinkedIn post about “DeutschlandGPT” caught my eye, promising an AI solution “Made in Germany”. Upon closer inspection, some concerning details emerged. The company behind DeutschlandGPT, as listed in their Impressum, is DeutschlandGPT GmbH based in Germering. This suggests they’re likely just an ordinary T-Systems customer rather than... Read more 24 Jun 2024 - 1 minute read
Peter Gostev, Head of AI at Moonpig, recently shared his experience with AI-generated videos on LinkedIn. It’s a topic I’ve been following closely, and I’d like to share my thoughts on this evolving technology. Gostev noted that while AI video generation has long seemed promising, the results have often fallen short. However, he reported a rece... Read more 02 Jun 2024 - 1 minute read
From Ethan Mollick’s LinkedIn post about AI-generated sound effects to the recent Heise iX article on AI in UI design, it’s clear that AI is making inroads into every aspect of digital creation. But as with any new technology, it’s crucial to approach these developments with a critical eye. The Heise iX article, for instance, shows that while C... Read more 31 May 2024 - 1 minute read
People on Social Media are excited about a new model on the LMSYS Chatbot Arena: gpt2-chatbot. Some theorize that it may be GPT-2. I have fingerprinted the Tokenizer, and no: not GPT-2, but consistent with OpenAI cl100k (used for GPT 3.5 onwards). Peculiar gaps in world knowledge (both niche and common knowledge) were the same as in the other GP... Read more 29 May 2024 - less than 1 minute read