0 providers50 models

Use-case preset

E-commerce product search cost calculator

Query understanding and ranking for interactive product search under 2s.

Each request is a short user query plus a small product catalog slice or ranking context — 2k context covers it comfortably. Output is compact: a ranked list, a rephrased query, or a structured filter object. The 80/20 ratio reflects the context-heavy nature of retrieval reranking.

Under-2s p95 is a hard product requirement; search latency above 2s measurably hurts conversion. This is one of the highest-throughput presets in the catalog — Black Friday peaks can 10x baseline RPM, so verify your provider's burst limits, not just sustained limits. Cached prompt at 30% reflects a stable system prompt and category taxonomy that repeats across requests. Cost optimization at scale means smaller, faster models win over larger ones: quality differences between a 7B and 70B model narrow significantly on short reranking tasks.

Recommended models

Fast inference at low cost; quality is sufficient for structured reranking on short contexts.
Strong instruction following on short structured tasks; competitive latency at 2k context.
Very fast and cheap; widely deployed with predictable latency at high RPM.
Good quality-speed tradeoff on classification and ranking tasks; low output cost at 20% output ratio.
Step up to this when query understanding quality matters more than per-request cost.