Llama 4 has been released - partially. It’s a suite of three LLMs, with the biggest model (“Behemoth”) still in training. Notes:
- ~apparently no restrictions on use within the EU~ [see Update below]
- ~trained in fp8 precision~ [see Update below]
- already live on OpenRouter
- via Together: smallest model “Scout” costlier than Gemini 1.5 Pro on my first tests
- also available from Groq and Fireworks, much cheaper than Together
- Hugging Face model collection
- Maxime Labonne points out license problems
[Update] The Acceptable Use Policy does place restrictions on EU citizens:
With respect to any multimodal models included in Llama 4, the rights granted under Section 1(a) of the Llama 4 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. (via)
[Update 2025-04-06] While it was trained in fp8, weights are released in bf16:
The Llama 4 Scout model is released as BF16 weights, but can fit within a single H100 GPU with on-the-fly int4 quantization; the Llama 4 Maverick model is released as both BF16 and FP8 quantized weights. The FP8 quantized weights fit on a single H100 DGX host while still maintaining quality. (Model card)
Providers Groq, Fireworks and Together are not advertised as bf16 on OpenRouter, but DeepInfra is.
[Update 2025-04-06 #2] Testing both Scout and Maverick (fp8) on the process visualization practice examples from my recent article, they mostly fail - and are never cheaper than Gemini 1.5 Pro.