Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Mistral Large 2
vs
Nemotron-4 340B Instruct
vs
Qwen 3 72B Instruct
Mistral Large 2A
Mistral Large 2
123B params · 131K context · mistral-research
Cheapest provideropenrouter
$/1M input$1800000.00
$/1M output$5400000.00
Nemotron-4 340B InstructB
Nemotron-4 340B Instruct
340B params · 4K context · nvidia-open-model
Cheapest provider—
$/1M input—
$/1M output—
Qwen 3 72B InstructC
Qwen 3 72B Instruct
72B params · 131K context · qwen
Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Specs and cheapest providers
| Spec | Mistral Large 2 | Nemotron-4 340B Instruct | Qwen 3 72B Instruct |
|---|---|---|---|
| Parameters | 123B | 340B | 72B |
| Context window | 131K tokens | 4K tokens | 131K tokens |
| License | mistral-research | nvidia-open-model | qwen |
| Released | 2024-07-24 | 2024-06-14 | 2025-04-28 |
| Cheapest provider | |||
| Provider | openrouter | — | fireworks-ai |
| Input / 1M tokens | $1800000.00 | — | $220000.00🏆 |
| Output / 1M tokens | $5400000.00 | — | $880000.00🏆 |
Benchmark comparison
No benchmark data available yet.
Editor's take
Three publisher flagships with almost nothing in common beyond their billing tier. Mistral Large 2 is a 123B-parameter dense model released July 2024 by France-based Mistral AI, built around European multilingual quality, structured output reliability, and tight integration with Mistral's managed API. Function calling and JSON mode are first-class. The 128K context window is matched by few rivals at this price point. The Mistral Research License restricts self-hosted commercial use, so production workloads generally go through Mistral's own API or require an enterprise agreement.
Nemotron-4 340B Instruct is NVIDIA's flagship open release, a dense 340-billion-parameter model released June 2024. Its design purpose is synthetic training data generation rather than general-purpose chat — NVIDIA explicitly positioned it as a reference model for producing diverse, high-quality instruction datasets for fine-tuning smaller models. The 4K context ceiling is a hard constraint that rules it out for most document-processing and RAG use cases. Hosting concentrates on NVIDIA's NIM service, and the NVIDIA Open Model License is not OSI-approved. If you are not building synthetic data pipelines, the cost-to-benefit is difficult to justify.
Qwen 3 72B Instruct is Alibaba's April 2025 flagship open model at 72 billion parameters, inheriting a 131K context window and adding meaningfully improved multilingual coverage across CJK, Arabic, and European languages over its 2.5-series predecessor. Benchmark performance is competitive with Mistral Large 2 on most English-language tasks and surpasses it on multilingual evals.
Pick Mistral Large 2 for European-language enterprise workloads with managed API support. Pick Nemotron-4 340B only for synthetic data generation at scale. Pick Qwen 3 72B as the cost-effective option when multilingual breadth and long-context support are both required.
Compare two at a time
Frequently asked questions
- How does Mistral Large 2 compare to Nemotron-4 340B Instruct and Qwen 3 72B Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Mistral Large 2, Nemotron-4 340B Instruct, or Qwen 3 72B Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Mistral Large 2, Nemotron-4 340B Instruct, and Qwen 3 72B Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details