Use-case preset
Production app (1B tokens/mo) cost calculator
SLA-grade production at 1B tokens/mo; caching and rate limits are critical.
At 1B tokens/month with a 70/30 split, you're looking at $700–1,400/mo on typical mid-tier pricing — but prompt caching at 50% cuts that by 30–40% in practice. Under-2s p95 latency is the threshold where users start noticing delays; dropping below it requires either a fast model or a dedicated endpoint.
SLAs matter here: a single provider outage means user-facing downtime, so multi-provider failover or reserved capacity becomes worth the overhead. Rate limits are a real constraint at this scale — verify your provider's TPM ceiling before architecture decisions, not after. Monthly billing is now large enough to warrant quarterly pricing negotiations with your provider. Prompt cache amortization is the single biggest lever: every percentage point of cache hit rate saves roughly $5–10/mo at this volume.