Use-case preset
Data cleaning and normalization cost calculator
Normalize messy CSV rows, addresses, and free-text fields at batch scale.
Data cleaning and normalization processes messy CSV rows — inconsistent address formats, free-text product names, misspelled fields — returning standardized structured output for each row. A typical record with schema context runs 200–500 input tokens and produces 100–200 tokens of normalized output, producing the 70/30 split in a compact 2k window.
Batch scheduling applies throughout; per-row throughput cost drives economics. cachedPromptPercent of 45 captures the schema definition, normalization rules, and output format template that prefix every row — the highest-leverage cache target. Small instruction models (7B–14B) handle well-defined normalization rules reliably; larger models add cost without proportional accuracy gains on structured transformation tasks. The main quality risk is schema drift: when source data format changes, the cached system prompt becomes stale and accuracy degrades silently.