Use-case preset

Content moderation cost calculator

Classify user-generated content at high throughput; tiny output, large volume.

A high-throughput classifier that reads user-generated content and emits one of three labels: safe, flag, or block. The prompt is the content plus a short policy rubric; the output is a structured label plus optional reasoning. The 95/5 input/output ratio reflects that: nearly all tokens are input, the reply is tiny. The 1k context cap enforces strict input budgets — content that exceeds it gets truncated or rejected at the pipeline level.

At moderation volumes (millions of items per day), input cost dominates everything. A 50% cache rate covers the stable policy rubric injected on every call. The key trade-off is accuracy versus throughput: a tiny model (1–3B) maximizes throughput and minimizes cost but may miss edge cases. Run a precision/recall benchmark against your policy before deploying small models — false negatives in moderation carry real reputational and legal risk.

Recommended models

meta/llama-3.2-3b-instruct

Ultra-low cost per classification; fast throughput for high-volume moderation pipelines.

google/gemma-2-2b-it

Smallest viable model with reasonable classification accuracy; minimizes cost at scale.

ibm/granite-3.1-8b-instruct

Built for enterprise classification tasks; good accuracy on policy-adherence scoring.

meta/llama-3.1-8b-instruct

Better accuracy than 3B models when precision/recall thresholds require it.

mistralai/mistral-small-3

Strong structured output reliability for label+reasoning format at modest cost.