Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

LLM Tokenizer comparison

A poster on LinkedIn highlighted the Xenova Tokenizer Playground to compare Tokenizer efficiency.

I remarked:

There is however a difference between what this Playground calculates and what the relevant APIs report as actually used (and therefore billed) input tokens. With a short German sentence:

  • Xenova Playground „Claude“: 13 Tokens
  • Claude 2.1 API: 22 Tokens
  • Claude 3 API: 20 Tokens
  • Xenova Playground “gpt-4/…”: 11 Tokens
  • OpenAI API: 28 Tokens

Note that tokenization seems to work differently for Claude 2 and 3, which the Xenova Playground doesn‘t account for either.

(German sentence: „Was ist das englische Wort für Dokument?“)