Nils Durner's Blog Ahas, Breadcrumbs, Coding Epiphanies

Llama 4 released

Llama 4 has been released - partially. It’s a suite of three LLMs, with the biggest model (“Behemoth”) still in training. Notes:

  • ~apparently no restrictions on use within the EU~ [see Update below]
  • ~trained in fp8 precision~ [see Update below]
  • already live on OpenRouter
    • via Together: smallest model “Scout” costlier than Gemini 1.5 Pro on my first tests
    • also available from Groq and Fireworks, much cheaper than Together
  • Hugging Face model collection
  • Maxime Labonne points out license problems

[Update] The Acceptable Use Policy does place restrictions on EU citizens:

With respect to any multimodal models included in Llama 4, the rights granted under Section 1(a) of the Llama 4 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. (via)

[Update 2025-04-06] While it was trained in fp8, weights are released in bf16:

The Llama 4 Scout model is released as BF16 weights, but can fit within a single H100 GPU with on-the-fly int4 quantization; the Llama 4 Maverick model is released as both BF16 and FP8 quantized weights. The FP8 quantized weights fit on a single H100 DGX host while still maintaining quality. (Model card)

Providers Groq, Fireworks and Together are not advertised as bf16 on OpenRouter, but DeepInfra is.

[Update 2025-04-06 #2] Testing both Scout and Maverick (fp8) on the process visualization practice examples from my recent article, they mostly fail - and are never cheaper than Gemini 1.5 Pro.