Model crosswalk
Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.
DeepSeek V3
vs
DeepSeek V3.2
DeepSeek V3A
DeepSeek V3
671B params · 131K context · deepseek
Cheapest providerdeepinfra
$/1M input$200000.00
$/1M output$850000.00
DeepSeek V3.2B
DeepSeek V3.2
671B params · 131K context · deepseek
Cheapest providertogether-ai
$/1M input$270000.00
$/1M output$1100000.00
Specs and cheapest providers
| Spec | DeepSeek V3 | DeepSeek V3.2 |
|---|---|---|
| Parameters | 671B | 671B |
| Context window | 131K tokens | 131K tokens |
| License | deepseek | deepseek |
| Released | 2024-12-26 | 2025-05-07 |
| Cheapest provider | ||
| Provider | deepinfra | together-ai |
| Input / 1M tokens | $200000.00🏆 | $270000.00 |
| Output / 1M tokens | $850000.00🏆 | $1100000.00 |
Add a third model to compare
Benchmark comparison
No benchmark data available for either model yet.
Sample workload — 5M in + 2M out per month
using each model's cheapest providerWhat changes at scale
Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.
1M in · 250K out$412500.00 · $545000.00
5M in · 2M out$2700000.00 · $3550000.00
20M in · 10M out$12500000.00 · $16400000.00
100M in · 60M out$71000000.00 · $93000000.00
Capability vs price
scatter// scatter: benchmark × $/1M out
Calculate cost for your workload
Compare total monthly cost across providers for DeepSeek V3 and DeepSeek V3.2 using your own input/output token mix.
Open workload calculator →Editor's take
DeepSeek V3.2 is an in-place architectural revision of DeepSeek V3, keeping the same 671B sparse MoE frame (~37B active parameters) while trimming inference cost by roughly 30% versus its predecessor. If you're routing production traffic through V3 today, V3.2 is not a lateral swap — it is the same model family with meaningfully lower per-token rates at most providers.
Both models use Multi-Head Latent Attention and DeepSeek's FP8 mixed-precision training pipeline. The difference is mostly in post-training: V3.2 received additional instruction-following refinement and updated RLHF data, which narrows the gap to frontier reasoning models on coding and math benchmarks without inflating active-parameter count.
[DeepSeek V3](/models/deepseek--deepseek-v3) holds a pricing advantage on providers that have not yet migrated their weights — you may find lower spot rates on burst capacity tied to the older checkpoint. For batch inference jobs where latency is unconstrained, the rate difference can offset V3.2's benchmark gains and make V3 the cheaper option per correct output.
[DeepSeek V3.2](/models/deepseek--deepseek-v3.2) wins on interactive workloads: the additional instruction-following tuning reduces refusals and format drift on long system prompts, making it more reliable for structured-output agents and multi-turn chat pipelines.
Pick V3 if your workload is batch-oriented, your provider still offers a lower per-token rate on the old checkpoint, and benchmark delta is immaterial to your use case. Pick V3.2 if you're running interactive agents, structured-output pipelines, or any task sensitive to instruction-following fidelity — the cost reduction over V3 makes it the default upgrade path.
Related comparisons
Full model details