Use-case preset
Consumer chatbot at scale cost calculator
Public-facing chatbot for millions of users; unit economics matter.
A public-facing chatbot serving millions of users: short user messages, brief assistant replies, high concurrency. The 70/30 input/output split and 4k context window reflect typical conversational turns — users rarely send walls of text, and responses stay focused. The 2s p95 latency requirement is a UX hard ceiling; above that, users perceive lag and abandon.
Unit economics dominate at this scale. A 50% cached prompt percentage captures the stable system prompt repeated on every turn, cutting effective input cost nearly in half. Quantized inference (fp8/int4) is acceptable since users rarely notice quality differences in casual chat. Watch rate-limit headroom across providers — a single provider's burst ceiling can cap your concurrency more than cost does. Prefer models with published per-token pricing over subscription tiers to keep cost-per-session predictable.