Blended cost for a workload of 100M input tokens and 10M output tokens per month — the most realistic cost-of-ownership comparison for most production applications.
| # | Model | Family | Blended monthly cost | Providers | Last updated |
|---|---|---|---|---|---|
| 04 | Qwen 2.5 Coder 32B Instruct | qwen | $0.14/mo | 1 | May 16 |
| 05 |
| qwen |
| $0.20/mo |
| 2 |
| May 16 |
| 06 | Qwen 2.5 72B Instruct | qwen | $0.21/mo | 3 | May 16 |
| 07 | Mixtral 8x7B Instruct | mixtral | $0.22/mo | 2 | May 16 |
| 08 | Qwen 3 72B Instruct | qwen | $0.25/mo | 4 | May 17 |
| 09 | Llama 3.3 70B Instruct | llama | $0.26/mo | 5 | May 17 |
| 10 | Llama 3.1 70B Instruct | llama | $0.26/mo | 3 | May 16 |
| 11 | DeepSeek V3 | deepseek | $0.28/mo | 3 | May 16 |
| 12 | DeepSeek R1 Distill Llama 70B | deepseek | $0.34/mo | 3 | May 16 |
| 13 | DeepSeek V3.2 | deepseek | $0.38/mo | 1 | May 17 |
| 14 | DeepSeek R1 | deepseek | $0.60/mo | 3 | May 16 |
| 15 | Mixtral 8x22B Instruct | mixtral | $0.67/mo | 4 | May 17 |
| 16 | Mistral Large 2 | mistral | $2.34/mo | 1 | May 16 |
| 17 | Llama 3.1 405B Instruct | llama | $3.05/mo | 4 | May 17 |
| 18 | Command R+ | command-r | $3.50/mo | 1 | May 16 |
The blended cost leaderboard calculates total monthly spend using a fixed reference workload of 100 million input tokens and 10 million output tokens per month. That 10:1 ratio reflects a common pattern for retrieval-augmented generation and document-processing pipelines, where large context windows inflate prompt size relative to completion length. The dollar figure shown is `(price_input × 100) + (price_output × 10)`, both in USD per million tokens. If your actual ratio differs substantially — say, you run a coding assistant where output volume is much higher — the calculator will give you a more accurate number.
A model can rank cheapest on input but expensive enough on output that it's not actually the best deal for any real workload. Ranking them separately gives you two numbers that require mental arithmetic to combine, and that arithmetic depends on an assumed ratio. By fixing the ratio to 10:1 and computing the blended cost, this leaderboard gives you a single comparable figure. It's a useful first filter, not a replacement for running your own numbers — but it's more actionable than reading two separate columns and doing the math in your head.
Trust the blended ranking for quick comparisons when your workload is somewhere in the RAG or document-analysis space and you haven't profiled your actual token ratios yet. Use the calculator when you know your real distribution — for example, if your 30-day logs show a 3:1 input-to-output ratio because you run a generative content pipeline. The blended ranking also doesn't account for tiered pricing, caching discounts, or per-request minimums. If a provider offers a cached-token rate, the real cost for a heavy-reuse workload can be 40–60% lower than the base rate suggests.
No, the reference workload assumes all input tokens are billed at the standard non-cached rate. Prompt caching is supported by several providers — typically at a 50–90% discount on repeated context — but the discount rules vary enough across providers that applying a single assumption would misrepresent most of them. The leaderboard's blended cost is therefore a ceiling for workloads that could take advantage of caching. If your workload reuses a large system prompt across many calls, check each provider's caching terms separately; that's likely where the biggest real-world savings are.