Use-case preset
E-commerce product search cost calculator
Query understanding and ranking for interactive product search under 2s.
Each request is a short user query plus a small product catalog slice or ranking context — 2k context covers it comfortably. Output is compact: a ranked list, a rephrased query, or a structured filter object. The 80/20 ratio reflects the context-heavy nature of retrieval reranking.
Under-2s p95 is a hard product requirement; search latency above 2s measurably hurts conversion. This is one of the highest-throughput presets in the catalog — Black Friday peaks can 10x baseline RPM, so verify your provider's burst limits, not just sustained limits. Cached prompt at 30% reflects a stable system prompt and category taxonomy that repeats across requests. Cost optimization at scale means smaller, faster models win over larger ones: quality differences between a 7B and 70B model narrow significantly on short reranking tasks.