Mistral Large 2 vs Qwen 3 72B Instruct (2026) — pricing, benchmarks, cheapest providers

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Mistral Large 2

Qwen 3 72B Instruct

Mistral Large 2A

Mistral Large 2

123B params · 131K context · mistral-research

Cheapest provideropenrouter

$/1M input$1800000.00

$/1M output$5400000.00

Qwen 3 72B InstructB

Qwen 3 72B Instruct

72B params · 131K context · qwen

Cheapest providerfireworks-ai

$/1M input$220000.00

$/1M output$880000.00

Specs and cheapest providers

Spec	Mistral Large 2	Qwen 3 72B Instruct
Parameters	123B	72B
Context window	131K tokens	131K tokens
License	mistral-research	qwen
Released	2024-07-24	2025-04-28
Cheapest provider
Provider	openrouter	fireworks-ai
Input / 1M tokens	$1800000.00	$220000.00🏆
Output / 1M tokens	$5400000.00	$880000.00🏆

#5 Qwen 3 72B Instruct in cheapest output #10 Qwen 3 72B Instruct in fastest TTFT #10 Qwen 3 72B Instruct in highest throughput

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider

Mistral Large 2

$19800000.00 /mo

Qwen 3 72B Instruct

$2860000.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$3150000.00 · $440000.00

5M in · 2M out$19800000.00 · $2860000.00

20M in · 10M out$90000000.00 · $13200000.00

100M in · 60M out$504000000.00 · $74800000.00

Capability vs price

scatter

// scatter: benchmark × $/1M out

Calculate cost for your workload

Compare total monthly cost across providers for Mistral Large 2 and Qwen 3 72B Instruct using your own input/output token mix.

Open workload calculator →

Editor's take

[Mistral Large 2](/models/mistralai--mistral-large-2) is a 123B dense model; [Qwen 3 72B Instruct](/models/alibaba--qwen-3-72b-instruct) is Alibaba's third-generation 72B model with a hybrid thinking architecture that can toggle extended chain-of-thought reasoning per request. Pricing on Qwen 3 72B typically lands 35–50% below Mistral Large 2 on output tokens — but that gap narrows significantly when you enable thinking mode, which multiplies token count. Qwen 3 72B's thinking mode changes the comparison dynamic. For tasks like multi-step math, competitive coding, and complex logical deduction, it can match or exceed Mistral Large 2 quality by spending more tokens internally. In non-thinking mode it's a cost-efficient general model; in thinking mode it trades latency and cost for depth. **Where [Mistral Large 2](/models/mistralai--mistral-large-2) wins:** Latency-sensitive production workloads where you can't afford 2–5x token overhead from extended reasoning. Dense document processing, real-time chat, and structured extraction tasks where consistent, fast inference matters more than occasional deeper reasoning. **Where Qwen 3 72B Instruct wins:** Developer environments, agentic loops requiring genuine multi-step problem solving, and code generation tasks where you can selectively enable thinking mode for hard subtasks. Its stronger math and reasoning benchmarks in thinking mode make it the right pick when answer quality outweighs inference cost. Pick Mistral Large 2 if you need predictable low-latency outputs across varied workloads without per-request reasoning overhead. Pick Qwen 3 72B Instruct if you're building reasoning-heavy agents or want a dual-mode model that scales quality with compute budget.

Related comparisons

Qwen 3 72b Instruct vs Deepseek V3.2 →Mistral Large 2 vs Deepseek V3.2 →Qwen 3 72b Instruct vs Deepseek R1 →Mistral Large 2 vs Llama 3.1 405b Instruct →

Full model details

All providers for Mistral Large 2 →All providers for Qwen 3 72B Instruct →