0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Hermes 3 Llama 3.1 405B
vs
Nemotron-4 340B Instruct
vs
WizardLM-2 8x22B
Hermes 3 Llama 3.1 405BA

Hermes 3 Llama 3.1 405B

405B params · 131K context · llama-3

Cheapest provider
$/1M input
$/1M output
Nemotron-4 340B InstructB

Nemotron-4 340B Instruct

340B params · 4K context · nvidia-open-model

Cheapest provider
$/1M input
$/1M output
WizardLM-2 8x22BC

WizardLM-2 8x22B

141B params · 66K context · wizardlm-2-community

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecHermes 3 Llama 3.1 405BNemotron-4 340B InstructWizardLM-2 8x22B
Parameters405B340B141B
Context window131K tokens🏆4K tokens66K tokens
Licensellama-3nvidia-open-modelwizardlm-2-community
Released2024-08-122024-06-142024-04-15
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
Three large open-weights instruction models, each trained for a specific purpose rather than general deployment. Hermes 3 Llama 3.1 405B is Nous Research's August 2024 fine-tune of Meta's 405B base, applying the same Hermes instruction methodology as the 70B variant at full frontier scale. With 131K context and the Llama 3 community license, it is the only model in this comparison you would consider for complex reasoning chains, long-document agent orchestration, or use cases where the 70B-class hits a capability ceiling. Lambda Labs and a small set of GPU-heavy hosts carry it. Per-token costs are high relative to 70B alternatives, but this is the highest-parameter openly licensed model with explicit reasoning trace training available as of mid-2026. Nemotron-4 340B Instruct from NVIDIA, released June 2024, serves a fundamentally different purpose: synthetic data generation. At 340B dense parameters with a 4K context ceiling and hosting concentrated on NVIDIA NIM, it is not a practical backend for general production inference. The narrow context window disqualifies it from multi-turn or document-level tasks. NVIDIA designed it as a teacher model — use it to generate diverse, high-quality instruction datasets, then train smaller models on the output. The NVIDIA Open Model License is not OSI-approved. WizardLM-2 8x22B from Microsoft Research is the cost-efficient option here — 141B total parameters with 39B active per pass, a 64K context window, and strong multi-turn conversational benchmark scores at MoE cost. The WizardLM 2 Community License carries attribution requirements that need review before commercial deployment. Pick Hermes 3 405B when raw capability at maximum scale and reasoning trace quality are the requirement. Pick Nemotron-4 340B exclusively for synthetic instruction-data generation pipelines on NVIDIA NIM. Pick WizardLM-2 8x22B for cost-efficient conversational quality on MoE infrastructure.
Compare two at a time
Frequently asked questions
How does Hermes 3 Llama 3.1 405B compare to Nemotron-4 340B Instruct and WizardLM-2 8x22B on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Hermes 3 Llama 3.1 405B, Nemotron-4 340B Instruct, or WizardLM-2 8x22B?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Hermes 3 Llama 3.1 405B, Nemotron-4 340B Instruct, and WizardLM-2 8x22B?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details