Simon Willison, creator of Datasette and co-creator of Django, recently asked on Twitter for a “vibe check” on Llama 3.1 405B. He was particularly interested in whether it’s becoming a credible self-hosted alternative to the best OpenAI or Anthropic models, and if any companies previously hesitant about sending data to API providers are now usin... Read more 30 Jul 2024 - 1 minute read
Insightful article in WSJ, as anti-AI sentiment seems to be growing: Technology providers increasingly offer kitted-out AI premium products, although they have yet to gain traction among many enterprise customers. Tools like Copilot for Microsoft 365 or Gemini for Google Workspace are turning out to require a lot of hand-holding to make them ... Read more 29 Jul 2024 - 1 minute read
A recent paper in Nature, “AI models collapse when trained on recursively generated data” by Shumailov et al., has sparked a heated debate in the AI community about the potential risks of using synthetic data for training language models. The paper suggests that indiscriminate use of model-generated content in training can cause irreversible def... Read more 28 Jul 2024 - 2 minute read
Ethan Mollick, Associate Professor at The Wharton School, recently noted some significant gaps in current LLM benchmarking: No benchmark for LLM hallucination rates Few benchmarks with human comparisons Lack of common benchmarks for use cases like innovation, writing, persuasion, human interaction, education, and creativity Mollick poi... Read more 20 Jul 2024 - less than 1 minute read
The University of Milano-Bicocca has published a significant work for Generative AI in Italy. As Alessandro Vitale notes in his LinkedIn post, there was previously no benchmark to understand how well LLMs performed in Italian. The new benchmark adapts INVALSI tests, which are typically given to Italian students in elementary, middle, and high sc... Read more 15 Jul 2024 - 1 minute read