Use-case preset

Multi-turn customer service cost calculator

Long support conversations with case history and KB chunks.

Multi-turn support conversations typically include a persistent system prompt with agent guidelines, the customer's case history, and 2–5 relevant knowledge-base chunks retrieved per turn. Output is short: acknowledgments, follow-up questions, or resolution steps. The 85/15 input/output ratio reflects that context dominates token spend.

Context is set to 16k to hold a full case thread plus KB snippets without truncation. Prompt caching saves 50–70% on input costs because the system prompt and KB chunks repeat across every turn in a session. Latency is best-effort; most support queues can tolerate a few seconds. The main cost lever here is cache hit rate — segment sessions cleanly and keep the cached prefix stable. Watch for cache misses caused by dynamic timestamps or session metadata injected before the KB chunks.

Recommended models

meta/llama-3.3-70b-instruct

Strong instruction following and long-context coherence at mid-tier cost; good default for support workloads.

alibaba/qwen-2.5-72b-instruct

Competitive input pricing with solid multi-turn consistency; handles KB chunk injection well.

mistralai/mistral-large-2

Reliable instruction adherence and low hallucination rate on factual KB retrieval.

cohere/command-r-plus

Built-in RAG tooling and native prompt caching support make it a natural fit for KB-augmented support.

deepseek/deepseek-v3

Low input cost with strong reasoning; useful when ticket complexity requires nuanced responses.