0 providers0 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Deepseek R1
vs
Deepseek V3.2
vs
Qwen 3 72b Instruct
Deepseek R1A

Deepseek R1

Cheapest provider
$/1M input
$/1M output
Deepseek V3.2B

Deepseek V3.2

Cheapest provider
$/1M input
$/1M output
Qwen 3 72b InstructC

Qwen 3 72b Instruct

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecDeepseek R1Deepseek V3.2Qwen 3 72b Instruct
Parameters
Context window
License
Released
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
A reasoning specialist, a cost-efficient MoE generalist, and a balanced multilingual 72B — three distinct profiles that serve different task requirements. DeepSeek R1 applies reinforcement learning to produce explicit chain-of-thought reasoning traces, making it the strongest of the three on GPQA Diamond and competition-level mathematics. The reasoning process adds tokens and latency — that cost is appropriate when auditability of the reasoning chain matters or when the task genuinely requires multi-step derivation. R1 is not the right tool for short-form completions or classification tasks where the chain-of-thought overhead is waste. Context window is 131K. Verify DeepSeek's commercial license before production deployment. DeepSeek V3.2 is the general-capability MoE from May 2025, routing each token through a subset of a large expert pool for roughly 37B active parameters per pass. On coding benchmarks, math, and general reasoning it delivers top-tier performance at a per-token cost roughly 30% below the V3 baseline. The 131K context window is accessible. Provider coverage spans DeepInfra, Fireworks, OpenRouter. Same license caveat as R1. Qwen 3 72B Instruct from April 2025 is a dense 72B model from Alibaba with strong MMLU, HumanEval, and multilingual scores. CJK and Arabic coverage is materially better than either DeepSeek model. It runs on mainstream providers including Together AI, Fireworks, and Groq, and the Qwen license supports commercial use. The per-token cost is predictable and stable in a way that MoE inference sometimes is not under load. Pick DeepSeek R1 for hard reasoning tasks where GPQA-class performance and reasoning transparency justify the token overhead. Pick DeepSeek V3.2 for strong general-purpose performance at the best cost-efficiency ratio. Pick Qwen 3 72B for multilingual workloads, predictable latency, or when mainstream provider coverage matters.
Compare two at a time
Frequently asked questions
How does Deepseek R1 compare to Deepseek V3.2 and Qwen 3 72b Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Deepseek R1, Deepseek V3.2, or Qwen 3 72b Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Deepseek R1, Deepseek V3.2, and Qwen 3 72b Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details