Leaderboard · cheapest-blended

Cheapest blended LLM pricing

May 27, 2026

Blended cost for a workload of 100M input tokens and 10M output tokens per month — the most realistic cost-of-ownership comparison for most production applications.

Family

License

Size

Quant

Region

Stale entries

14+ days old

These models haven't had a confirmed pricing scrape in the last 14 days.

#	Model	Family	Blended monthly cost	Providers	Last updated
01	Llama 3.1 8B Instructstale	llama	$2.50/mo	1	May 27
02	Qwen 2.5 72B Instructstale	qwen	$40.00/mo	1	May 27
03	DeepSeek V3stale	deepseek	$40.90/mo	1	May 27
04	Llama 3.1 70B Instructstale	llama	$44.00/mo	1	May 27
05	DeepSeek R1 Distill Llama 70Bstale	deepseek	$78.00/mo	1	May 27

Related leaderboards

Cheapest LLM Input Price Cheapest LLM Output Price Fastest LLM Time to First Token Highest LLM Throughput (tok/s)Longest LLM Context Window Best LLM MMLU Score Best LLM HumanEval Score Most Available LLM Providers

Frequently asked questions

What workload mix does the blended ranking use?

The blended cost leaderboard calculates total monthly spend using a fixed reference workload of 100 million input tokens and 10 million output tokens per month. That 10:1 ratio reflects a common pattern for retrieval-augmented generation and document-processing pipelines, where large context windows inflate prompt size relative to completion length. The dollar figure shown is `(price_input × 100) + (price_output × 10)`, both in USD per million tokens. If your actual ratio differs substantially — say, you run a coding assistant where output volume is much higher — the calculator will give you a more accurate number.

Why combine input and output rather than rank them separately?

A model can rank cheapest on input but expensive enough on output that it's not actually the best deal for any real workload. Ranking them separately gives you two numbers that require mental arithmetic to combine, and that arithmetic depends on an assumed ratio. By fixing the ratio to 10:1 and computing the blended cost, this leaderboard gives you a single comparable figure. It's a useful first filter, not a replacement for running your own numbers — but it's more actionable than reading two separate columns and doing the math in your head.

When should I trust the blended ranking vs running the calculator?

Trust the blended ranking for quick comparisons when your workload is somewhere in the RAG or document-analysis space and you haven't profiled your actual token ratios yet. Use the calculator when you know your real distribution — for example, if your 30-day logs show a 3:1 input-to-output ratio because you run a generative content pipeline. The blended ranking also doesn't account for tiered pricing, caching discounts, or per-request minimums. If a provider offers a cached-token rate, the real cost for a heavy-reuse workload can be 40–60% lower than the base rate suggests.

Does the blended ranking include cached input tokens?

No, the reference workload assumes all input tokens are billed at the standard non-cached rate. Prompt caching is supported by several providers — typically at a 50–90% discount on repeated context — but the discount rules vary enough across providers that applying a single assumption would misrepresent most of them. The leaderboard's blended cost is therefore a ceiling for workloads that could take advantage of caching. If your workload reuses a large system prompt across many calls, check each provider's caching terms separately; that's likely where the biggest real-world savings are.