Use-case preset
Customer support chatbot cost calculator
Real-time chatbot answering customer questions over live chat or email.
A live chat or help-desk bot receives a user message plus conversation history (typically 3–8 prior turns) and returns a short, direct answer. At moderate scale — say, 50k conversations/day — token costs dominate infrastructure spend.
The 70/30 input/output split reflects the conversational pattern: system prompt, conversation history, and the user's question together outweigh the concise reply. A 4k context window covers all but the longest threads. The 2s p95 latency target keeps the experience snappy without requiring a sub-second inference tier. Set `cachedPromptPercent` to ~40 because the system prompt and FAQ context repeat across turns; cache hits cut blended cost by roughly 30–40%. Smaller quantized models (7B–13B) are worth testing here — instruction-following quality is more critical than reasoning depth, and latency wins at those sizes often offset any quality delta.