0 providers50 models

Use-case preset

Synthetic data generation cost calculator

Generate labeled training rows from a schema and seed; output-heavy batch workload.

Generate labeled training rows from a schema definition and seed examples: the prompt provides column definitions, constraints, and 2–5 example rows; the model outputs a batch of 10–50 new rows in JSON or CSV. Runs as an offline pipeline filling a training dataset.

The 30/70 input/output ratio reflects a compact schema prompt versus large tabular output. A 4k context window holds the schema plus roughly 30 structured output rows, which is the practical batch size before quality degrades. Latency is batch/best-effort — speed matters only for iteration velocity during dataset construction. `cachedPromptPercent` is ~10: the schema is constant but each generation call should produce unique rows, limiting cache reuse. Watch output entropy: models tend to repeat patterns after 20–30 rows in a single call. Better to issue multiple independent calls than to push past 4k output tokens, which increases repetition and reduces diversity in the synthetic dataset.

Recommended models

High diversity in generated rows with reliable schema adherence; good default for complex schemas.
Strong structured output generation; handles nested JSON schemas cleanly at a mid-tier price.
Excellent at generating diverse, realistic tabular data with varied field distributions.
Cost-efficient MoE model with solid instruction following for structured output generation at scale.