WildVision Arena Benchmark

Ethan Mollick remarks that “people realize how capable mutlimodal AI is, right now, out of the box”. I agree: For those who want to try and compare different models, there is https://huggingface.co/spaces/WildVision/vision-arena. Besides the usual suspects, I recommend trying Reka AI Flash. Read more... 07 May 2024 - less than 1 minute read

Sora: manual post-processing

A lot of background on the practicalities from one of the Sora private beta-testing teams: https://www.fxguide.com/fxfeatured/actually-using-sora/ Read more... 25 Apr 2024 - less than 1 minute read

GLiNER NER model

Urchade Zaratiana has released GLiNER, an NER model created “that can identify any type of entity using a bidirectional transformer encoder”. One of the ToDos listed is „Allow longer context“, and a closer look reveals that: The Colab caps out at 384 tokens, and confirms DebertaV2 architecture. Read more... 25 Apr 2024 - less than 1 minute read

[UPDATED] OBS Stream Recording

How does one record a live stream in absence, perhaps using AWS EC2? A discussion on Reddit lead me to base everything on a g5.xlarge instance. Read more... 25 Sep 2024 (Updated) - 2 minute read

[UPDATED] Chatbot Arena Trickery

description: “Investigates evaluation ‘trickery’ in chatbot tournaments, showing how adversarial inputs can skew leaderboard results and recommending robust testing protocols.” layout: post title: “LMSYS Chatbot Arena, a popularity contest” date: 2024-04-17 last_updated: 2024-04-17 tags: [llm, lmsys, gemini, reka] — Read more... (Updated) - 2 minute read

Older Newer