Use-case preset
Intent classification cost calculator
Map a user utterance to one of N intents; batch or near-RT.
Each request passes a single user utterance (10–80 tokens) plus a stable list of intent definitions and few-shot examples in the system prompt. The model outputs a single intent label, optionally with a confidence score — rarely more than 20 tokens. That produces the 95/5 input/output split; the 1k context window covers even verbose intent taxonomies (up to ~80 classes with short descriptions).
This is a batch or near-real-time classification workload where per-call cost dominates over raw accuracy at the margin. The system prompt and intent list are identical across every call, so prompt-cache hit rates of 50–70% are realistic — with caching, effective cost drops to near-output-token pricing. Small models (1B–8B) perform competitively when the system prompt contains clear definitions and 2–3 examples per class. Validate with a confusion matrix before prod; ambiguous intents usually need example rebalancing, not a larger model.