Llama 3.3 70B Instruct — pricing, providers, and benchmarks

Parameters

70B

Context window

131k tokens

License

llama-3

Released

2024-12-06

Meta's Llama 3.3 70B Instruct shipped in December 2024 as a quiet, mid-cycle upgrade to the Llama 3 family — same 70B parameter count as 3.1, same 131K context window, but materially improved instruction-following, math, and reasoning that brings it within striking distance of the 405B variant on most evals. For most production workloads where Llama 3.1 70B was "good enough but not quite there," 3.3 is the straightforward swap. It's the most-deployed open-weights model on hosted inference in 2026 by a wide margin — every tier-1 provider in the catalog hosts it, with prices ranging from $0.22/$0.40 per 1M tokens (input/output) on the cheap end up to $0.88/$0.88 on speed-optimized providers like Groq.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

Provider	Input / 1M	Output / 1M	Monthly cost	Context
DeepInfra	$0.2300	$0.4000	$27.00	131k
Fireworks AI	$0.2200	$0.8800	$30.80	131k
OpenRouter	$0.2700	$0.8500	$35.50	131k
Groq	$0.5900	$0.7900	$66.90	131k
Together AI	$0.8800	$0.8800	$96.80	131k

Calculate cost for your exact workload →

Frequently asked questions

How much does it cost to run Llama 3.3 70B Instruct for 100M tokens?▾

Running Llama 3.3 70B Instruct with 100M input and 10M output tokens per month costs approximately $27.00 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Llama 3.3 70B Instruct?▾

DeepInfra currently offers Llama 3.3 70B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Llama 3.3 70B Instruct support?▾

Llama 3.3 70B Instruct supports a context window of 131,072 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.