Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Deepseek V3.2
vs
Hermes 3 Llama 3.1 405b
vs
Llama 3.1 405b Instruct
Deepseek V3.2A
Deepseek V3.2
Cheapest provider—
$/1M input—
$/1M output—
Hermes 3 Llama 3.1 405bB
Hermes 3 Llama 3.1 405b
Cheapest provider—
$/1M input—
$/1M output—
Llama 3.1 405b InstructC
Llama 3.1 405b Instruct
Cheapest provider—
$/1M input—
$/1M output—
Specs and cheapest providers
| Spec | Deepseek V3.2 | Hermes 3 Llama 3.1 405b | Llama 3.1 405b Instruct |
|---|---|---|---|
| Parameters | — | — | — |
| Context window | — | — | — |
| License | — | — | — |
| Released | — | — | — |
| Cheapest provider | |||
| Provider | — | — | — |
| Input / 1M tokens | — | — | — |
| Output / 1M tokens | — | — | — |
Benchmark comparison
No benchmark data available yet.
Editor's take
Two versions of the same 405B dense base versus a cost-efficient MoE competitor — the most interesting axis here is what Nous Research's fine-tuning adds to Meta's base, and whether that justifies the serving cost against a capable MoE alternative.
Llama 3.1 405B Instruct is Meta's July 2024 base in this comparison: 405 billion dense parameters, 131K context, Llama 3 community license. MMLU scores ranked near the top of open-weights models at launch, with strong general-instruction following across broad task types. Multi-GPU serving means hosting is limited but commercially available on Lambda Labs and Fireworks.
Hermes 3 Llama 3.1 405B is Nous Research's fine-tune of the same 405B base, released August 2024. The Hermes 3 fine-tuning recipe emphasizes explicit reasoning trace generation and complex agentic tasks — similar to how chain-of-thought training improves step-by-step problem solving. At 405B scale, it is the highest-parameter openly licensed model with explicit reasoning-trace training. Hosted coverage is limited to Lambda Labs and specialty providers at similar pricing to the base. For long-context document analysis and agent orchestration where the reasoning chain needs to be visible, Hermes 3 can outperform the base model in the scenarios Nous targeted.
DeepSeek V3.2 from May 2025 routes tokens through roughly 37B active parameters per MoE pass, delivering top-tier coding, math, and general reasoning benchmarks at a per-token cost well below 405B hosting. Where it trades off is licensing — DeepSeek's commercial terms require verification — and it lacks the explicit reasoning-trace fine-tuning that Hermes 3 offers.
Pick Hermes 3 405B for agent orchestration or multi-hop reasoning tasks where explicit trace quality matters and 405B scale is required. Pick base Llama 3.1 405B for frontier knowledge breadth with community license flexibility. Pick DeepSeek V3.2 for competitive frontier-tier performance at a fraction of 405B inference cost.
Compare two at a time
Frequently asked questions
- How does Deepseek V3.2 compare to Hermes 3 Llama 3.1 405b and Llama 3.1 405b Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Deepseek V3.2, Hermes 3 Llama 3.1 405b, or Llama 3.1 405b Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Deepseek V3.2, Hermes 3 Llama 3.1 405b, and Llama 3.1 405b Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details