Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Hermes 3 Llama 3.1 405B
vs
Nemotron-4 340B Instruct
vs
WizardLM-2 8x22B
Hermes 3 Llama 3.1 405BA
Hermes 3 Llama 3.1 405B
405B params · 131K context · llama-3
Cheapest provider—
$/1M input—
$/1M output—
Nemotron-4 340B InstructB
Nemotron-4 340B Instruct
340B params · 4K context · nvidia-open-model
Cheapest provider—
$/1M input—
$/1M output—
WizardLM-2 8x22BC
WizardLM-2 8x22B
141B params · 66K context · wizardlm-2-community
Cheapest provider—
$/1M input—
$/1M output—
Specs and cheapest providers
| Spec | Hermes 3 Llama 3.1 405B | Nemotron-4 340B Instruct | WizardLM-2 8x22B |
|---|---|---|---|
| Parameters | 405B | 340B | 141B |
| Context window | 131K tokens🏆 | 4K tokens | 66K tokens |
| License | llama-3 | nvidia-open-model | wizardlm-2-community |
| Released | 2024-08-12 | 2024-06-14 | 2024-04-15 |
| Cheapest provider | |||
| Provider | — | — | — |
| Input / 1M tokens | — | — | — |
| Output / 1M tokens | — | — | — |
Benchmark comparison
No benchmark data available yet.
Editor's take
Three large open-weights instruction models, each trained for a specific purpose rather than general deployment. Hermes 3 Llama 3.1 405B is Nous Research's August 2024 fine-tune of Meta's 405B base, applying the same Hermes instruction methodology as the 70B variant at full frontier scale. With 131K context and the Llama 3 community license, it is the only model in this comparison you would consider for complex reasoning chains, long-document agent orchestration, or use cases where the 70B-class hits a capability ceiling. Lambda Labs and a small set of GPU-heavy hosts carry it. Per-token costs are high relative to 70B alternatives, but this is the highest-parameter openly licensed model with explicit reasoning trace training available as of mid-2026.
Nemotron-4 340B Instruct from NVIDIA, released June 2024, serves a fundamentally different purpose: synthetic data generation. At 340B dense parameters with a 4K context ceiling and hosting concentrated on NVIDIA NIM, it is not a practical backend for general production inference. The narrow context window disqualifies it from multi-turn or document-level tasks. NVIDIA designed it as a teacher model — use it to generate diverse, high-quality instruction datasets, then train smaller models on the output. The NVIDIA Open Model License is not OSI-approved.
WizardLM-2 8x22B from Microsoft Research is the cost-efficient option here — 141B total parameters with 39B active per pass, a 64K context window, and strong multi-turn conversational benchmark scores at MoE cost. The WizardLM 2 Community License carries attribution requirements that need review before commercial deployment.
Pick Hermes 3 405B when raw capability at maximum scale and reasoning trace quality are the requirement. Pick Nemotron-4 340B exclusively for synthetic instruction-data generation pipelines on NVIDIA NIM. Pick WizardLM-2 8x22B for cost-efficient conversational quality on MoE infrastructure.
Compare two at a time
Frequently asked questions
- How does Hermes 3 Llama 3.1 405B compare to Nemotron-4 340B Instruct and WizardLM-2 8x22B on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Hermes 3 Llama 3.1 405B, Nemotron-4 340B Instruct, or WizardLM-2 8x22B?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Hermes 3 Llama 3.1 405B, Nemotron-4 340B Instruct, and WizardLM-2 8x22B?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details