Llama 3.1 8B Instruct — pricing, providers, and benchmarks

Parameters

Context window

131k tokens

License

llama-3

Released

2024-07-23

Llama 3.1 8B Instruct is the workhorse of the small-model tier: 8 billion parameters, 131K context window, and per-token pricing that sits at $0.05–$0.10 across most providers — cheap enough to run at scale without budget tracking. It's the right choice for classification, simple summarization, content moderation, autocomplete-style code suggestions, and any workload where you'd otherwise reach for a custom-trained model. Quality is well below the 70B for complex reasoning or nuanced generation, but for the 80% of production LLM workloads that are "extract this field" or "categorize this email," 8B is more than enough — and lets you run 10x the volume for the same monthly bill.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

Provider	Input / 1M	Output / 1M	Monthly cost	Context
Groq	$0.0500	$0.0800	$5.80	131k
DeepInfra	$0.0600	$0.0600	$6.60	131k
OpenRouter	$0.0600	$0.0900	$6.90	131k
Fireworks AI	$0.0700	$0.0700	$7.70	131k

Calculate cost for your exact workload →

Frequently asked questions

How much does it cost to run Llama 3.1 8B Instruct for 100M tokens?▾

Running Llama 3.1 8B Instruct with 100M input and 10M output tokens per month costs approximately $5.80 on Groq, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Llama 3.1 8B Instruct?▾

Groq currently offers Llama 3.1 8B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Llama 3.1 8B Instruct support?▾

Llama 3.1 8B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.