meta · llama familyVerified May 27, 2026

Llama 3.1 70B Instruct

70B params · 131K context · llama-3 · released 2024-07 · 5 providers

Meta's Llama 3.1 70B Instruct is a 70-billion-parameter open-weights text model released in July 2024 under the Llama 3 community license. The 3.1 generation was the first in the 70B class to ship with a 131K-token context window, making it viable for long-document tasks like contract analysis and codebase summarization that previously required larger, costlier models. On MMLU it scores around 79–80, competitive with contemporaries at this parameter count. Note that Llama 3.3 70B, released December 2024, targets the same footprint with meaningfully better instruction-following — new deployments should default to 3.3 unless pinning to a specific weight hash for reproducibility. With continued provider adoption and price compression at the 70B tier, per-token costs here will keep falling through 2026.

Cheapest right now

DeepInfra$0.40/1M out

Go to DeepInfra ↗Compare

sticky while scrolling · verified May 27, 2026

Cheapest now

$0.40

DeepInfra · fp16 · $/1M out

90-day price Δ

—

cheapest-host trend

Fastest TTFT

88 ms

Groq · 575 tok/s

Providers

indexed providers

Ranks:#4Cheapest input #2Cheapest output #4Blended monthly cost #1Fastest TTFT #1Fastest throughput #1Context window #1MMLU score #1HumanEval score #1Provider count

✓ verified May 27, 2026

#Provider$/1M in$/1M outTTFT p50tok/sQuant

1DeepInfraCHEAPEST$0.23$0.40430 ms195fp16go ↗

2Fireworks AI$0.22$0.88390 ms240fp16go ↗

3DeepInfra$0.23$0.40430 ms195fp16go ↗

4Hyperbolic$0.40$0.40510 ms105fp16go ↗

5Cerebras Inference$0.60$0.60210 ms2100fp16go ↗

6GroqFASTEST$0.59$0.7988 ms575fp8go ↗

Publishermeta

Parameters70B

Context window131k tokens

Licensellama-3

Released2024-07-23

Familyllama

Pricing by quantization

Provider	Input / 1M	Output / 1M	Tok/s
Cerebras Inference	$0.6000	$0.6000	2100
DeepInfra	$0.2300	$0.4000	195
Fireworks AI	$0.2200	$0.8800	240
Hyperbolic	$0.4000	$0.4000	105

Questions developers ask

How much does it cost to run Llama 3.1 70B Instruct for 100M tokens?▾

Running Llama 3.1 70B Instruct with 100M input and 10M output tokens per month costs approximately $27.00 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Llama 3.1 70B Instruct?▾

DeepInfra currently offers Llama 3.1 70B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the provider table above for the latest data.

What context window does Llama 3.1 70B Instruct support?▾

Llama 3.1 70B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower.

What's the cheapest way to run Llama 3.1 70B Instruct?▾

The cheapest way to run Llama 3.1 70B Instruct is via DeepInfra, starting at $0.23 per million input tokens. If your workload is prompt-heavy, enabling prompt caching can reduce costs further.

Is there a free tier for Llama 3.1 70B Instruct?▾

Free tiers vary by provider and change frequently. Check each provider's current pricing page for trial credits or free-tier limits. The prices shown on this page reflect paid API access.

How much does Llama 3.1 70B Instruct cost for 1M tokens?▾

Llama 3.1 70B Instruct input pricing starts at $0.23 per million tokens on DeepInfra. Output tokens are typically priced 2–4× higher than input tokens depending on the provider.

Keep exploring

Cheapest on Fireworks AI →Compare vs Llama 3.1 405B Instruct →Compare vs Llama 3.1 8B Instruct →Compare vs Llama 3.2 11B Vision Instruct →See full ranking: Cheapest input →See full ranking: Fastest TTFT →Used in the Customer support chatbot workload →Price history →

Methodology: prices scraped nightly from public pricing pages; snapshots append-only. Read methodology · Public API · CSV export · OG image per model.

Prices verified May 27, 2026 by scraping 5 providers. Methodology · Raw data

Llama 3.1 70B Instruct

Similar models

Run it locally instead?

Questions developers ask

Keep exploring