Use-case preset
Side project (10M tokens/mo) cost calculator
10M tokens/mo scale; 70/30 ratio, 4k context, best-effort.
At 10M tokens/month with a 70/30 input/output split and 4k average context, the monthly bill ranges from roughly $0.50–$1.50 on 7B–8B-class models (e.g. Llama 3.1 8B at ~$0.10–$0.18/M tokens) to $15–$40 on 70B-class models, to $150+ on frontier 405B or closed-source equivalents. The 4k context reflects typical chat or lightweight generation tasks — short prompts, medium replies.
Start with the cheapest 8B model that meets your quality bar; the difference between an 8B and 70B is meaningful for coding or reasoning but negligible for FAQ answering or short-form generation. Best-effort latency means you can use batch endpoints or off-peak pricing where available. Cache hit rate is modest (20–30%) — a stable system prompt helps. The upgrade trigger is usually quality degradation on a specific task type, not cost: 10M tokens/mo is cheap enough that one tier up rarely breaks a side-project budget.