Qwen 3 72B Instruct — pricing, providers, and benchmarks

Parameters
72B
Context window
131k tokens
License
qwen
Released
2025-04-28

Alibaba's Qwen 3 72B Instruct is the most credible alternative to Llama 3.3 70B in the same parameter class. Released in 2025, it ships with stronger multilingual support (particularly for Chinese, Japanese, Korean, and Arabic) and slightly better performance on long-context retrieval tasks. Pricing on hosted providers is broadly similar to Llama 3.3 70B — typically $0.25–$0.40 per 1M input tokens, $0.50–$0.90 per 1M output. Worth choosing over Llama for non-English-heavy workloads, code generation in non-English comments, or multilingual chat applications. For English-only deployments the choice between the two is largely a coin flip; pick whichever has more favorable pricing on your preferred provider.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

ProviderInput / 1MOutput / 1MMonthly costContext
DeepInfra$0.2300$0.4500$27.50131k
Fireworks AI$0.2200$0.8800$30.80131k
Together AI$0.2900$0.2900$31.90131k
OpenRouter$0.2700$0.8500$35.50131k

Frequently asked questions

How much does it cost to run Qwen 3 72B Instruct for 100M tokens?

Running Qwen 3 72B Instruct with 100M input and 10M output tokens per month costs approximately $27.50 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Qwen 3 72B Instruct?

DeepInfra currently offers Qwen 3 72B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Qwen 3 72B Instruct support?

Qwen 3 72B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.