Llama 3.1 8B Instruct — pricing, providers, and benchmarks

Parameters
8B
Context window
131k tokens
License
llama-3
Released
2024-07-23

Llama 3.1 8B Instruct is the workhorse of the small-model tier: 8 billion parameters, 131K context window, and per-token pricing that sits at $0.05–$0.10 across most providers — cheap enough to run at scale without budget tracking. It's the right choice for classification, simple summarization, content moderation, autocomplete-style code suggestions, and any workload where you'd otherwise reach for a custom-trained model. Quality is well below the 70B for complex reasoning or nuanced generation, but for the 80% of production LLM workloads that are "extract this field" or "categorize this email," 8B is more than enough — and lets you run 10x the volume for the same monthly bill.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

ProviderInput / 1MOutput / 1MMonthly costContext
Groq$0.0500$0.0800$5.80131k
DeepInfra$0.0600$0.0600$6.60131k
OpenRouter$0.0600$0.0900$6.90131k
Fireworks AI$0.0700$0.0700$7.70131k

Frequently asked questions

How much does it cost to run Llama 3.1 8B Instruct for 100M tokens?

Running Llama 3.1 8B Instruct with 100M input and 10M output tokens per month costs approximately $5.80 on Groq, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Llama 3.1 8B Instruct?

Groq currently offers Llama 3.1 8B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Llama 3.1 8B Instruct support?

Llama 3.1 8B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.