0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

DeepSeek R1 Distill Llama 70B
vs
Hermes 3 Llama 3.1 70B
vs
Refact Llama 3.1 70B
DeepSeek R1 Distill Llama 70BA

DeepSeek R1 Distill Llama 70B

70B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Hermes 3 Llama 3.1 70BB

Hermes 3 Llama 3.1 70B

70B params · 131K context · llama-3

Cheapest provider
$/1M input
$/1M output
Refact Llama 3.1 70BC

Refact Llama 3.1 70B

70B params · 131K context · llama-3

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecDeepSeek R1 Distill Llama 70BHermes 3 Llama 3.1 70BRefact Llama 3.1 70B
Parameters70B70B70B
Context window131K tokens131K tokens131K tokens
Licensemitllama-3llama-3
Released2025-01-202024-08-122024-09-01
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
DeepSeek R1 Distill Llama 70B, Hermes 3 Llama 3.1 70B, and Refact Llama 3.1 70B all start from the same 70B Llama base but diverge sharply in fine-tuning focus. Choosing between them is less about parameter count and more about what the fine-tune was actually optimized to do. DeepSeek R1 Distill Llama 70B, released January 2025, distills reasoning-chain supervision from the full 671B R1 MoE into a Llama 3.3 70B base. Independent benchmarks place it at roughly 70–80 percent of full R1's scores on AIME and MATH, at a fraction of the inference cost of running the full 671B model. Groq's hardware makes it one of the faster 70B options for latency-sensitive reasoning workloads. MIT license allows fully commercial deployment. Hermes 3 Llama 3.1 70B is a fine-tune by Nous Research, released August 2024 with 131K context. The training recipe emphasizes persona fidelity, explicit XML-tagged reasoning traces, and reduced RLHF-induced refusals compared to the vanilla Meta Instruct release. For agent frameworks where the model needs to follow tool schemas, maintain system prompt personas across a long turn, and produce structured output without softening, Hermes 3 is a near-zero-cost swap over the base Llama 3.1 70B Instruct with measurable improvements. The Llama 3 community license applies. Refact Llama 3.1 70B is a fine-tune by Together Computer and Refact AI, released September 2024, targeting code tab-completion and refactoring agent workflows in IDE-embedded pipelines. The 128K context window fits large file trees and multi-file diffs. Outside of that specific niche — IDE products and agentic code refactoring loops — the general-purpose Llama 3.1 70B Instruct remains the more versatile option. Llama 3 community license is inherited. Pick DeepSeek R1 Distill for multi-step mathematical reasoning at 70B cost. Pick Hermes 3 for agent and tool-use pipelines that need persona fidelity and structured output. Pick Refact for code IDE integrations and file-tree-level refactoring workflows.
Compare two at a time
Frequently asked questions
How does DeepSeek R1 Distill Llama 70B compare to Hermes 3 Llama 3.1 70B and Refact Llama 3.1 70B on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: DeepSeek R1 Distill Llama 70B, Hermes 3 Llama 3.1 70B, or Refact Llama 3.1 70B?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for DeepSeek R1 Distill Llama 70B, Hermes 3 Llama 3.1 70B, and Refact Llama 3.1 70B?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details