Use-case preset
Multi-turn customer service cost calculator
Long support conversations with case history and KB chunks.
Multi-turn support conversations typically include a persistent system prompt with agent guidelines, the customer's case history, and 2–5 relevant knowledge-base chunks retrieved per turn. Output is short: acknowledgments, follow-up questions, or resolution steps. The 85/15 input/output ratio reflects that context dominates token spend.
Context is set to 16k to hold a full case thread plus KB snippets without truncation. Prompt caching saves 50–70% on input costs because the system prompt and KB chunks repeat across every turn in a session. Latency is best-effort; most support queues can tolerate a few seconds. The main cost lever here is cache hit rate — segment sessions cleanly and keep the cached prefix stable. Watch for cache misses caused by dynamic timestamps or session metadata injected before the KB chunks.