Llama 3.1 405B Instruct — pricing, providers, and benchmarks

Parameters

405B

Context window

131k tokens

License

llama-3

Released

2024-07-23

Meta's Llama 3.1 405B Instruct was the largest open-weights model ever released at launch (July 2024) and remains the benchmark for "what's possible with open weights." At 405 billion dense parameters and a 131K context window, it competes with frontier closed models on most evaluations — but at 5-10x the per-token cost of a 70B model on most providers. Realistic hosted pricing in 2026 ranges from $2.70/$2.70 per 1M tokens on the cheapest providers up to $3.50/$3.50 on Together. Worth running for high-stakes tasks where output quality justifies the price gap: complex agents, long-context analysis, or critical content generation. For most chat and RAG workloads, Llama 3.3 70B delivers 90% of the quality at 10% of the cost.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

Provider	Input / 1M	Output / 1M	Monthly cost	Context
DeepInfra	$2.7000	$8.0000	$350.00	131k
Together AI	$3.5000	$3.5000	$385.00	131k
Fireworks AI	$3.0000	$9.0000	$390.00	131k
OpenRouter	$3.5000	$10.5000	$455.00	131k

Calculate cost for your exact workload →

Frequently asked questions

How much does it cost to run Llama 3.1 405B Instruct for 100M tokens?▾

Running Llama 3.1 405B Instruct with 100M input and 10M output tokens per month costs approximately $350.00 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Llama 3.1 405B Instruct?▾

DeepInfra currently offers Llama 3.1 405B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Llama 3.1 405B Instruct support?▾

Llama 3.1 405B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.