DeepSeek V3 vs DeepSeek V3.2 (2026) — pricing, benchmarks, cheapest providers

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

DeepSeek V3

DeepSeek V3.2

DeepSeek V3A

DeepSeek V3

671B params · 131K context · deepseek

Cheapest providerdeepinfra

$/1M input$200000.00

$/1M output$850000.00

DeepSeek V3.2B

DeepSeek V3.2

671B params · 131K context · deepseek

Cheapest providertogether-ai

$/1M input$270000.00

$/1M output$1100000.00

Specs and cheapest providers

Spec	DeepSeek V3	DeepSeek V3.2
Parameters	671B	671B
Context window	131K tokens	131K tokens
License	deepseek	deepseek
Released	2024-12-26	2025-05-07
Cheapest provider
Provider	deepinfra	together-ai
Input / 1M tokens	$200000.00🏆	$270000.00
Output / 1M tokens	$850000.00🏆	$1100000.00

#7 DeepSeek V3 in cheapest input #6 DeepSeek V3 in best MMLU #7 DeepSeek V3.2 in best MMLU #6 DeepSeek V3 in best HumanEval #7 DeepSeek V3.2 in best HumanEval

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider

DeepSeek V3

$2700000.00 /mo

DeepSeek V3.2

$3550000.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$412500.00 · $545000.00

5M in · 2M out$2700000.00 · $3550000.00

20M in · 10M out$12500000.00 · $16400000.00

100M in · 60M out$71000000.00 · $93000000.00

Capability vs price

scatter

// scatter: benchmark × $/1M out

Calculate cost for your workload

Compare total monthly cost across providers for DeepSeek V3 and DeepSeek V3.2 using your own input/output token mix.

Open workload calculator →

Editor's take

DeepSeek V3.2 is an in-place architectural revision of DeepSeek V3, keeping the same 671B sparse MoE frame (~37B active parameters) while trimming inference cost by roughly 30% versus its predecessor. If you're routing production traffic through V3 today, V3.2 is not a lateral swap — it is the same model family with meaningfully lower per-token rates at most providers. Both models use Multi-Head Latent Attention and DeepSeek's FP8 mixed-precision training pipeline. The difference is mostly in post-training: V3.2 received additional instruction-following refinement and updated RLHF data, which narrows the gap to frontier reasoning models on coding and math benchmarks without inflating active-parameter count. [DeepSeek V3](/models/deepseek--deepseek-v3) holds a pricing advantage on providers that have not yet migrated their weights — you may find lower spot rates on burst capacity tied to the older checkpoint. For batch inference jobs where latency is unconstrained, the rate difference can offset V3.2's benchmark gains and make V3 the cheaper option per correct output. [DeepSeek V3.2](/models/deepseek--deepseek-v3.2) wins on interactive workloads: the additional instruction-following tuning reduces refusals and format drift on long system prompts, making it more reliable for structured-output agents and multi-turn chat pipelines. Pick V3 if your workload is batch-oriented, your provider still offers a lower per-token rate on the old checkpoint, and benchmark delta is immaterial to your use case. Pick V3.2 if you're running interactive agents, structured-output pipelines, or any task sensitive to instruction-following fidelity — the cost reduction over V3 makes it the default upgrade path.

Related comparisons

Deepseek V3 vs Deepseek R1 →Deepseek V3.2 vs Deepseek R1 →Deepseek V3.2 vs Llama 3.1 405b Instruct →Deepseek V3.2 vs Llama 3.3 70b Instruct →

Full model details

All providers for DeepSeek V3 →All providers for DeepSeek V3.2 →