Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
DeepSeek V3
vs
Mistral Large 2
vs
Qwen 3 72B Instruct
DeepSeek V3A
DeepSeek V3
671B params · 131K context · deepseek
Cheapest providerdeepinfra
$/1M input$200000.00
$/1M output$850000.00
Mistral Large 2B
Mistral Large 2
123B params · 131K context · mistral-research
Cheapest provideropenrouter
$/1M input$1800000.00
$/1M output$5400000.00
Qwen 3 72B InstructC
Qwen 3 72B Instruct
72B params · 131K context · qwen
Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Specs and cheapest providers
| Spec | DeepSeek V3 | Mistral Large 2 | Qwen 3 72B Instruct |
|---|---|---|---|
| Parameters | 671B | 123B | 72B |
| Context window | 131K tokens | 131K tokens | 131K tokens |
| License | deepseek | mistral-research | qwen |
| Released | 2024-12-26 | 2024-07-24 | 2025-04-28 |
| Cheapest provider | |||
| Provider | deepinfra | openrouter | fireworks-ai |
| Input / 1M tokens | $200000.00🏆 | $1800000.00 | $220000.00 |
| Output / 1M tokens | $850000.00🏆 | $5400000.00 | $880000.00 |
Benchmark comparison
No benchmark data available yet.
Editor's take
Three serious competitors from the late-2024 frontier tier, all with 131K context windows and strong benchmark profiles — but with meaningfully different cost trajectories heading into 2026.
DeepSeek V3 is the 671B-parameter mixture-of-experts model from December 2024, routing tokens through 8 of 256 experts for roughly 37B active parameters per pass. At launch, it was among the most capable open models on code and math benchmarks relative to its effective inference cost. The key context in 2026: DeepSeek V3.2 shipped in May 2025 with roughly 30% lower inference pricing. V3 remains hosted on DeepInfra, Fireworks, and OpenRouter but is now the legacy variant — if you are starting fresh, V3.2 is the current-generation choice. DeepSeek's license requires verification for commercial use.
Mistral Large 2 is Mistral AI's 123B flagship from July 2024, positioned as a strong general-purpose model with competitive MMLU and coding scores. It performs well on French and European-language benchmarks relative to peers, reflecting Mistral's European origin. Hosted through Mistral's own API and selected providers. License terms are Mistral's own Research License, with commercial deployment available through their API.
Qwen 3 72B Instruct is Alibaba's April 2025 model — the newest of the three, with strong multilingual coverage that spans CJK and Arabic alongside competitive MMLU and HumanEval scores. At 72B it is substantially cheaper to serve than either V3 or Mistral Large 2 at full activation count, and provider coverage on mainstream platforms is wide.
Pick DeepSeek V3.2 (over V3) when MoE inference efficiency and top coding benchmarks are the priority. Pick Mistral Large 2 when European-language quality and Mistral's API ecosystem are relevant. Pick Qwen 3 72B for multilingual breadth and the best cost-to-capability ratio at the 72B tier.
Compare two at a time
Frequently asked questions
- How does DeepSeek V3 compare to Mistral Large 2 and Qwen 3 72B Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: DeepSeek V3, Mistral Large 2, or Qwen 3 72B Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for DeepSeek V3, Mistral Large 2, and Qwen 3 72B Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details