Llama 3.1 405B Instruct — pricing, providers, and benchmarks

Parameters
405B
Context window
131k tokens
License
llama-3
Released
2024-07-23

Meta's Llama 3.1 405B Instruct was the largest open-weights model ever released at launch (July 2024) and remains the benchmark for "what's possible with open weights." At 405 billion dense parameters and a 131K context window, it competes with frontier closed models on most evaluations — but at 5-10x the per-token cost of a 70B model on most providers. Realistic hosted pricing in 2026 ranges from $2.70/$2.70 per 1M tokens on the cheapest providers up to $3.50/$3.50 on Together. Worth running for high-stakes tasks where output quality justifies the price gap: complex agents, long-context analysis, or critical content generation. For most chat and RAG workloads, Llama 3.3 70B delivers 90% of the quality at 10% of the cost.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

ProviderInput / 1MOutput / 1MMonthly costContext
DeepInfra$2.7000$8.0000$350.00131k
Together AI$3.5000$3.5000$385.00131k
Fireworks AI$3.0000$9.0000$390.00131k
OpenRouter$3.5000$10.5000$455.00131k

Frequently asked questions

How much does it cost to run Llama 3.1 405B Instruct for 100M tokens?

Running Llama 3.1 405B Instruct with 100M input and 10M output tokens per month costs approximately $350.00 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Llama 3.1 405B Instruct?

DeepInfra currently offers Llama 3.1 405B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Llama 3.1 405B Instruct support?

Llama 3.1 405B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.