Yi 1.5 34B Chat vs Yi 1.5 9B Chat (2026) — pricing, benchmarks, cheapest providers

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Yi 1.5 34b Chat

Yi 1.5 9b Chat

Yi 1.5 34b ChatA

Yi 1.5 34b Chat

Cheapest provider—

$/1M input—

$/1M output—

Yi 1.5 9b ChatB

Yi 1.5 9b Chat

Cheapest provider—

$/1M input—

$/1M output—

Specs and cheapest providers

Spec	Yi 1.5 34b Chat	Yi 1.5 9b Chat
Parameters	—	—
Context window	—	—
License	—	—
Released	—	—
Cheapest provider
Provider	—	—
Input / 1M tokens	—	—
Output / 1M tokens	—	—

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider

Yi 1.5 34b Chat

$0.00 /mo

Yi 1.5 9b Chat

$0.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$0.00 · $0.00

5M in · 2M out$0.00 · $0.00

20M in · 10M out$0.00 · $0.00

100M in · 60M out$0.00 · $0.00

Capability vs price

scatter

// scatter: benchmark × $/1M out

Calculate cost for your workload

Compare total monthly cost across providers for Yi 1.5 34b Chat and Yi 1.5 9b Chat using your own input/output token mix.

Open workload calculator →

Editor's take

Within the Yi 1.5 family, the 34B and 9B models share architecture and training data but differ in parameter count — and that gap shows up in both cost and quality. [Yi 1.5 34B Chat](/models/01-ai--yi-1.5-34b-chat) typically costs $0.25–0.40/M tokens versus $0.03–0.07/M for the 9B, a 5–8× cost ratio that makes model selection a straightforward cost-quality calculation. On MMLU, the 34B scores approximately 76–78% versus the 9B's 65–68% — a 10+ point gap. Multi-hop reasoning, longer context coherence, and complex instruction adherence all improve substantially with the larger model. For tasks that require holding context across a full document (8K+ tokens) or following multi-constraint instructions, the 34B is noticeably more reliable. [Yi 1.5 9B Chat](/models/01-ai--yi-1.5-9b-chat) handles single-turn Q&A, classification, summarization of short documents, and low-latency chatbot responses efficiently. Its memory footprint (fits on a single A10G with room to spare) means better throughput per dollar on shared GPU infrastructure, and cold-start latency is significantly lower than the 34B. Both models carry 01.AI's strong Mandarin Chinese training, so the language-quality advantage is less of a differentiator within the family; the choice reduces to task complexity and budget. Pick Yi 1.5 9B Chat for high-volume, cost-sensitive workloads where single-turn quality in the 65–68% MMLU range is acceptable. Pick Yi 1.5 34B Chat when multi-step reasoning, longer context, or higher accuracy justify the 5–8× token cost increase.

Related comparisons

Yi 1.5 34b Chat vs Qwen 3 32b Instruct →Yi 1.5 34b Chat vs Gemma 2 27b It →Yi 1.5 9b Chat vs Llama 3.1 8b Instruct →Yi 1.5 9b Chat vs Qwen 3 8b Instruct →

Full model details

All providers for Yi 1.5 34b Chat →All providers for Yi 1.5 9b Chat →