alibaba · qwen familyVerified May 27, 2026

Qwen 2.5 72B Instruct

72B params · 131K context · qwen · released 2024-09 · 4 providers

Qwen 2.5 72B Instruct is the previous-generation flagship from Alibaba, a 72-billion-parameter model with a 131K context window released in September 2024. Superseded by Qwen 3 72B in April 2025, it remains widely deployed because workloads pinned to a specific checkpoint for reproducibility rarely migrate quickly — inference pipelines with cached prompts, fine-tuned adapters, or benchmarked output distributions commonly run a generation behind for six months or more. Benchmark scores on MMLU, HumanEval, and multilingual evals remain competitive against current-generation models in its parameter class. The Qwen license covers commercial use. Pricing across hosted providers tracks closely with Qwen 3 72B for now, though expect 2.5 rates to soften as provider capacity shifts toward the newer generation.

Cheapest right now

DeepInfra$0.35/1M out

Go to DeepInfra ↗Compare

sticky while scrolling · verified May 27, 2026

Cheapest now

$0.35

DeepInfra · fp16 · $/1M out

90-day price Δ

—

cheapest-host trend

Fastest TTFT

390 ms

Fireworks AI · 240 tok/s

Providers

indexed providers

Ranks:#3Cheapest input #3Cheapest output #2Blended monthly cost #2Fastest TTFT #2Fastest throughput #2Context window #2MMLU score #2HumanEval score #2Provider count

✓ verified May 27, 2026

#Provider$/1M in$/1M outTTFT p50tok/sQuant

1DeepInfraCHEAPEST$0.18$0.35400 ms200fp16go ↗

2Fireworks AIFASTEST$0.20$0.80390 ms240fp16go ↗

3OpenRouter$0.22$0.75——unknowngo ↗

4DeepInfra$0.18$0.35400 ms200fp16go ↗

5Hyperbolic$0.40$0.40500 ms105fp16go ↗

Publisheralibaba

Parameters72B

Context window131k tokens

Licenseqwen

Released2024-09-19

Familyqwen

Pricing by quantization

Provider	Input / 1M	Output / 1M	Tok/s
DeepInfra	$0.1800	$0.3500	200
Fireworks AI	$0.2000	$0.8000	240
Hyperbolic	$0.4000	$0.4000	105

Questions developers ask

How much does it cost to run Qwen 2.5 72B Instruct for 100M tokens?▾

Running Qwen 2.5 72B Instruct with 100M input and 10M output tokens per month costs approximately $21.50 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Qwen 2.5 72B Instruct?▾

DeepInfra currently offers Qwen 2.5 72B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the provider table above for the latest data.

What context window does Qwen 2.5 72B Instruct support?▾

Qwen 2.5 72B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower.

What's the cheapest way to run Qwen 2.5 72B Instruct?▾

The cheapest way to run Qwen 2.5 72B Instruct is via DeepInfra, starting at $0.18 per million input tokens. If your workload is prompt-heavy, enabling prompt caching can reduce costs further.

Is there a free tier for Qwen 2.5 72B Instruct?▾

Free tiers vary by provider and change frequently. Check each provider's current pricing page for trial credits or free-tier limits. The prices shown on this page reflect paid API access.

How much does Qwen 2.5 72B Instruct cost for 1M tokens?▾

Qwen 2.5 72B Instruct input pricing starts at $0.18 per million tokens on DeepInfra. Output tokens are typically priced 2–4× higher than input tokens depending on the provider.

Keep exploring

Cheapest on DeepInfra →Compare vs Qwen 2.5 Coder 32B Instruct →Compare vs Qwen 2.5 Coder 7B Instruct →Compare vs Qwen 3 14B Instruct →See full ranking: Cheapest input →See full ranking: Fastest TTFT →Used in the Customer support chatbot workload →Price history →

Methodology: prices scraped nightly from public pricing pages; snapshots append-only. Read methodology · Public API · CSV export · OG image per model.

Prices verified May 27, 2026 by scraping 4 providers. Methodology · Raw data

Provider	Input / 1M	Output / 1M	Tok/s
DeepInfra	$0.3600	$0.4000	—
OpenRouter	$0.2200	$0.7500	—

Qwen 2.5 72B Instruct

Similar models

Run it locally instead?

Questions developers ask

Keep exploring