LLM Tokenizer comparison

A poster on LinkedIn highlighted the Xenova Tokenizer Playground to compare Tokenizer efficiency.

There is however a difference between what this Playground calculates and what the relevant APIs report as actually used (and therefore billed) input tokens. With a short German sentence:

Xenova Playground „Claude“: 13 Tokens
Claude 2.1 API: 22 Tokens
Claude 3 API: 20 Tokens
Xenova Playground “gpt-4/…”: 11 Tokens
OpenAI API: 28 Tokens

Note that tokenization seems to work differently for Claude 2 and 3, which the Xenova Playground doesn‘t account for either.

(German sentence: „Was ist das englische Wort für Dokument?“)