Qwen 2.5 72B Instruct vs Qwen 3 72B Instruct (2026) — pricing, benchmarks, cheapest providers

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Qwen 2.5 72B Instruct

Qwen 3 72B Instruct

Qwen 2.5 72B InstructA

Qwen 2.5 72B Instruct

72B params · 131K context · qwen

Cheapest providerdeepinfra

$/1M input$180000.00

$/1M output$350000.00

Qwen 3 72B InstructB

Qwen 3 72B Instruct

72B params · 131K context · qwen

Cheapest providerfireworks-ai

$/1M input$220000.00

$/1M output$880000.00

Specs and cheapest providers

Spec	Qwen 2.5 72B Instruct	Qwen 3 72B Instruct
Parameters	72B	72B
Context window	131K tokens	131K tokens
License	qwen	qwen
Released	2024-09-19	2025-04-28
Cheapest provider
Provider	deepinfra	fireworks-ai
Input / 1M tokens	$180000.00🏆	$220000.00
Output / 1M tokens	$350000.00🏆	$880000.00

#6 Qwen 2.5 72B Instruct in cheapest input #5 Qwen 3 72B Instruct in cheapest output #7 Qwen 2.5 72B Instruct in cheapest output #9 Qwen 2.5 72B Instruct in fastest TTFT #10 Qwen 3 72B Instruct in fastest TTFT #9 Qwen 2.5 72B Instruct in highest throughput #10 Qwen 3 72B Instruct in highest throughput #4 Qwen 2.5 72B Instruct in best MMLU #4 Qwen 2.5 72B Instruct in best HumanEval

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider

Qwen 2.5 72B Instruct

$1600000.00 /mo

Qwen 3 72B Instruct

$2860000.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$267500.00 · $440000.00

5M in · 2M out$1600000.00 · $2860000.00

20M in · 10M out$7100000.00 · $13200000.00

100M in · 60M out$39000000.00 · $74800000.00

Capability vs price

scatter

// scatter: benchmark × $/1M out

Calculate cost for your workload

Compare total monthly cost across providers for Qwen 2.5 72B Instruct and Qwen 3 72B Instruct using your own input/output token mix.

Open workload calculator →

Editor's take

Qwen 3 72B Instruct is the direct successor to Qwen 2.5 72B Instruct; both are dense 72B models from Alibaba, but the third-generation model ships with meaningful improvements. Qwen 3 72B pushes MMLU from ~86 (2.5) to ~89–90, gains a 128K context window (vs 32K for 2.5), and improves on math and coding benchmarks — HumanEval moves from ~86% to ~90%+. The price premium for Qwen 3 72B is real but modest: roughly $0.80–$1.40/M tokens vs $0.60–$1.10/M for Qwen 2.5 72B across major providers. For teams already running Qwen 2.5 72B, the migration path is straightforward — same tokenizer family, compatible prompt formats — so switching involves minimal integration overhead. **Where Qwen 2.5 72B wins:** workloads where the lower price is the primary driver and existing MMLU ~86 quality is already meeting SLA. It has broader provider availability and is more likely to be available on legacy GPU SKUs with proven uptime records. **Where Qwen 3 72B wins:** any workload pushing the boundaries of the 2.5 generation — longer documents that exceed 32K tokens, math and reasoning tasks where the accuracy delta matters, or new deployments where there's no sunk cost in the older model. Pick [Qwen 2.5 72B Instruct](/models/alibaba--qwen-2.5-72b-instruct) when you're already on it and the cost savings outweigh the quality increment. Pick [Qwen 3 72B Instruct](/models/alibaba--qwen-3-72b-instruct) for new deployments or any workload where the 128K context window or improved reasoning benchmarks pay off.

Related comparisons

Qwen 3 72b Instruct vs Deepseek V3.2 →Qwen 3 72b Instruct vs Deepseek R1 →Qwen 3 72b Instruct vs Llama 3.1 405b Instruct →Qwen 3 72b Instruct vs Llama 3.3 70b Instruct →

Full model details

All providers for Qwen 2.5 72B Instruct →All providers for Qwen 3 72B Instruct →