0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

DeepSeek V3
vs
DeepSeek V3.2
vs
Qwen 3 72B Instruct
DeepSeek V3A

DeepSeek V3

671B params · 131K context · deepseek

Cheapest providerdeepinfra
$/1M input$200000.00
$/1M output$850000.00
DeepSeek V3.2B

DeepSeek V3.2

671B params · 131K context · deepseek

Cheapest provider
$/1M input
$/1M output
Qwen 3 72B InstructC

Qwen 3 72B Instruct

72B params · 131K context · qwen

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecDeepSeek V3DeepSeek V3.2Qwen 3 72B Instruct
Parameters671B671B72B
Context window131K tokens131K tokens131K tokens
Licensedeepseekdeepseekqwen
Released2024-12-262025-05-072025-04-28
Cheapest provider
Providerdeepinfra
Input / 1M tokens$200000.00
Output / 1M tokens$850000.00
Benchmark comparison

No benchmark data available yet.

Editor's take
Two generations of the same MoE architecture against Alibaba's current-generation 72B, making this comparison more about versioning decisions than fundamental capability gaps. DeepSeek V3 is the December 2024 release — a 671B mixture-of-experts model routing tokens through 8 of 256 experts for roughly 37B active parameters per pass. At launch it was one of the most capable open models on coding, math, and general reasoning benchmarks relative to its serving cost. The 131K context window and broad third-party hosting made it widely adopted. In 2026, V3 is now the legacy variant: the same providers that hosted V3 have largely migrated available capacity toward V3.2, and choosing V3 requires intentional version pinning. DeepSeek V3.2, released May 2025, reduces inference pricing roughly 30% compared to V3 while maintaining comparable or improved benchmark scores. If you are choosing between the two DeepSeek variants without a version-pinning requirement, V3.2 is the correct answer. The commercial license terms for both require the same verification. Both expose a 131K context window. Qwen 3 72B Instruct from April 2025 is a dense 72B model — not MoE — which means different operational characteristics than either DeepSeek variant. Active parameters per forward pass are the full 72B, making per-token cost roughly comparable to V3.2 on many providers while offering more predictable latency variance. Multilingual coverage, especially CJK and Arabic, is where Qwen 3 72B consistently outperforms the DeepSeek V3 line. Strong MMLU and HumanEval scores, Qwen commercial license. Pick DeepSeek V3.2 when you want the best benchmark-per-dollar on code and math tasks and the MoE architecture is acceptable. Pick DeepSeek V3 only if version-pinning requirements force it. Pick Qwen 3 72B for multilingual workloads, more predictable latency, and broader mainstream provider coverage.
Compare two at a time
Frequently asked questions
How does DeepSeek V3 compare to DeepSeek V3.2 and Qwen 3 72B Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: DeepSeek V3, DeepSeek V3.2, or Qwen 3 72B Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for DeepSeek V3, DeepSeek V3.2, and Qwen 3 72B Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details