Following up on our conversation on quantized models on smartphones, Stefano Fiorucci wrote a post about how to run a small language model on a smartphone: . This involves either Layla Lite App or Temux. One commenter recommended LLM Farm on iPhone. Read more 09 Apr 2024 - less than 1 minute read
RAGfluencers are of course discontent with Gemini’s very large 1 million token context window, noting the high costs associated with a large number of input tokens. “It feels like a very niche use case”. The niche for the the 1M tokens would be multi-modality in general and video in particular. My modest experiments suggest that the model does ... Read more 03 Apr 2024 - 1 minute read
Paper about “A Meeting Assistant Benchmark for Long-Context Language Models” with a remarkable side-note: We also provide a thorough analysis of our GPT-4-based evaluation method, encompassing insights from a crowdsourcing study. Our findings suggest that while GPT-4’s evaluation scores are correlated with human judges’, its ability to differ... Read more 01 Apr 2024 - less than 1 minute read
A lot has been (and continues to be) written about the xz Backdoor. What is, however, even more troubling is that this yet another demonstrated open-source supply chain attack, perhaps with years of preparation in advance. It could have hit(*) any downstream maintainer, just like with the faker.js incident, but there were two possible evasion f... Read more 30 Mar 2024 - less than 1 minute read
A poster on LinkedIn highlighted the Xenova Tokenizer Playground to compare Tokenizer efficiency. I remarked: There is however a difference between what this Playground calculates and what the relevant APIs report as actually used (and therefore billed) input tokens. With a short German sentence: Xenova Playground „Claude“: 13 Tokens Clau... Read more 26 Mar 2024 - less than 1 minute read