Llama 3.1 405B Instruct — pricing, providers, and benchmarks
Meta's Llama 3.1 405B Instruct was the largest open-weights model ever released at launch (July 2024) and remains the benchmark for "what's possible with open weights." At 405 billion dense parameters and a 131K context window, it competes with frontier closed models on most evaluations — but at 5-10x the per-token cost of a 70B model on most providers. Realistic hosted pricing in 2026 ranges from $2.70/$2.70 per 1M tokens on the cheapest providers up to $3.50/$3.50 on Together. Worth running for high-stakes tasks where output quality justifies the price gap: complex agents, long-context analysis, or critical content generation. For most chat and RAG workloads, Llama 3.3 70B delivers 90% of the quality at 10% of the cost.
Provider pricing
Sorted by total monthly cost for 100M input + 10M output tokens.
| Provider | Input / 1M | Output / 1M | Monthly cost | Context |
|---|---|---|---|---|
| DeepInfra | $2.7000 | $8.0000 | $350.00 | 131k |
| Together AI | $3.5000 | $3.5000 | $385.00 | 131k |
| Fireworks AI | $3.0000 | $9.0000 | $390.00 | 131k |
| OpenRouter | $3.5000 | $10.5000 | $455.00 | 131k |
Frequently asked questions
How much does it cost to run Llama 3.1 405B Instruct for 100M tokens?▾
Running Llama 3.1 405B Instruct with 100M input and 10M output tokens per month costs approximately $350.00 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.
What is the cheapest provider for Llama 3.1 405B Instruct?▾
DeepInfra currently offers Llama 3.1 405B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.
What context window does Llama 3.1 405B Instruct support?▾
Llama 3.1 405B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.