Use-case preset

Customer support chatbot cost calculator

Real-time chatbot answering customer questions over live chat or email.

A live chat or help-desk bot receives a user message plus conversation history (typically 3–8 prior turns) and returns a short, direct answer. At moderate scale — say, 50k conversations/day — token costs dominate infrastructure spend.

The 70/30 input/output split reflects the conversational pattern: system prompt, conversation history, and the user's question together outweigh the concise reply. A 4k context window covers all but the longest threads. The 2s p95 latency target keeps the experience snappy without requiring a sub-second inference tier. Set `cachedPromptPercent` to ~40 because the system prompt and FAQ context repeat across turns; cache hits cut blended cost by roughly 30–40%. Smaller quantized models (7B–13B) are worth testing here — instruction-following quality is more critical than reasoning depth, and latency wins at those sizes often offset any quality delta.

Recommended models

meta/llama-3.3-70b-instruct

Strong instruction-following at ~$0.40/1M blended — the default safe choice for production support bots.

alibaba/qwen-2.5-72b-instruct

Comparable quality to Llama 3.3 70B with better multilingual coverage for global support deployments.

mistralai/mistral-small-3

Low-latency, cost-efficient option that handles common support FAQs well at a fraction of 70B pricing.

meta/llama-3.1-8b-instruct

Fast and cheap for high-volume deployments where support topics are narrow and well-defined.