Use-case preset

Side project (10M tokens/mo) cost calculator

10M tokens/mo scale; 70/30 ratio, 4k context, best-effort.

At 10M tokens/month with a 70/30 input/output split and 4k average context, the monthly bill ranges from roughly $0.50–$1.50 on 7B–8B-class models (e.g. Llama 3.1 8B at ~$0.10–$0.18/M tokens) to $15–$40 on 70B-class models, to $150+ on frontier 405B or closed-source equivalents. The 4k context reflects typical chat or lightweight generation tasks — short prompts, medium replies.

Start with the cheapest 8B model that meets your quality bar; the difference between an 8B and 70B is meaningful for coding or reasoning but negligible for FAQ answering or short-form generation. Best-effort latency means you can use batch endpoints or off-peak pricing where available. Cache hit rate is modest (20–30%) — a stable system prompt helps. The upgrade trigger is usually quality degradation on a specific task type, not cost: 10M tokens/mo is cheap enough that one tier up rarely breaks a side-project budget.

Recommended models

meta/llama-3.1-8b-instruct

Cheapest viable 8B; $1–2/mo at 10M tokens makes this the default starting point.

meta/llama-3.2-3b-instruct

3B for even lower cost on simple tasks; sub-$1/mo at 10M tokens.

google/gemma-2-9b-it

9B with strong general capability; good upgrade from 3B when quality matters more.

alibaba/qwen-3-8b-instruct

Competitive 8B quality with low per-token pricing; solid all-rounder for side projects.

meta/llama-3.3-70b-instruct

70B step-up for tasks that need stronger reasoning; ~$15–20/mo at 10M tokens.