Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Llama 3.1 405B Instruct
vs
Mistral Large 2
vs
Qwen 3 72B Instruct
Llama 3.1 405B InstructA
Llama 3.1 405B Instruct
405B params · 131K context · llama-3
Cheapest providerdeepinfra
$/1M input$2700000.00
$/1M output$8000000.00
Mistral Large 2B
Mistral Large 2
123B params · 131K context · mistral-research
Cheapest provideropenrouter
$/1M input$1800000.00
$/1M output$5400000.00
Qwen 3 72B InstructC
Qwen 3 72B Instruct
72B params · 131K context · qwen
Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Specs and cheapest providers
| Spec | Llama 3.1 405B Instruct | Mistral Large 2 | Qwen 3 72B Instruct |
|---|---|---|---|
| Parameters | 405B | 123B | 72B |
| Context window | 131K tokens | 131K tokens | 131K tokens |
| License | llama-3 | mistral-research | qwen |
| Released | 2024-07-23 | 2024-07-24 | 2025-04-28 |
| Cheapest provider | |||
| Provider | deepinfra | openrouter | fireworks-ai |
| Input / 1M tokens | $2700000.00 | $1800000.00 | $220000.00🏆 |
| Output / 1M tokens | $8000000.00 | $5400000.00 | $880000.00🏆 |
Benchmark comparison
No benchmark data available yet.
Editor's take
Llama 3.1 405B Instruct, Mistral Large 2, and Qwen 3 72B Instruct span a range of parameter counts and publisher strategies, but all target high-quality open-weights inference for production use. Meta's 405B (July 2024, Llama 3 community license) sits at the frontier of what's accessible in open weights; Mistral Large 2 (July 2024, Mistral Research license) and Qwen 3 72B (Alibaba, 2025, Qwen license) compete at the 70B-class tier with different multilingual and licensing profiles.
Qwen 3 72B is the most current of the three, bringing 2025-generation instruction tuning and strong multilingual performance across CJK and Arabic alongside 131K context. For teams serving non-English users, its multilingual advantage over both alternatives is tangible. The Qwen commercial license permits deployment without restrictions.
Mistral Large 2 delivers competitive multilingual performance across European languages and solid coding evals at the 70B scale. The Mistral Research license limits fully commercial use without an enterprise agreement — a real constraint for teams that need unrestricted deployment or fine-tuning rights. For organizations in an existing Mistral relationship, the quality-per-token profile remains competitive with the Qwen 3 72B.
Llama 3.1 405B is in a different operating tier. At 405 billion parameters it handles tasks that genuinely expose 70B limitations — complex multi-step reasoning, long-form synthesis, advanced coding workflows. Multi-GPU infrastructure requirements and thinner provider availability make it a targeted choice rather than a volume inference default. Per-token cost is substantially higher than either 70B-class alternative.
Pick Qwen 3 72B for general-purpose or multilingual production deployments. Pick Mistral Large 2 within an existing Mistral commercial agreement. Pick Llama 3.1 405B only when task complexity demonstrably saturates 70B-class models and the infrastructure cost is justified.
Compare two at a time
Frequently asked questions
- How does Llama 3.1 405B Instruct compare to Mistral Large 2 and Qwen 3 72B Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Llama 3.1 405B Instruct, Mistral Large 2, or Qwen 3 72B Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Llama 3.1 405B Instruct, Mistral Large 2, and Qwen 3 72B Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details