0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Gemma 2 27B IT
vs
Gemma 2 2B IT
vs
Gemma 2 9B IT
Gemma 2 27B ITA

Gemma 2 27B IT

27B params · 8K context · gemma

Cheapest provider
$/1M input
$/1M output
Gemma 2 2B ITB

Gemma 2 2B IT

2B params · 8K context · gemma

Cheapest provider
$/1M input
$/1M output
Gemma 2 9B ITC

Gemma 2 9B IT

9B params · 8K context · gemma

Cheapest providerdeepinfra
$/1M input$50000.00
$/1M output$60000.00
Specs and cheapest providers
SpecGemma 2 27B ITGemma 2 2B ITGemma 2 9B IT
Parameters27B2B9B
Context window8K tokens8K tokens8K tokens
Licensegemmagemmagemma
Released2024-07-312024-07-312024-07-31
Cheapest provider
Providerdeepinfra
Input / 1M tokens$50000.00
Output / 1M tokens$60000.00
Benchmark comparison

No benchmark data available yet.

Editor's take
Gemma 2 2B IT, Gemma 2 9B IT, and Gemma 2 27B IT are Google DeepMind's full Gemma 2 lineup, all released July 2024 and all carrying the same 8K context ceiling. All three are distilled from Gemini Ultra training data and share the Gemma license, which is permissive for commercial use but not OSI-approved — worth checking with legal before production deployment if your organization requires OSI-compliant licenses. The 8K context ceiling is the structural constraint that defines the entire family. For any workload requiring document-length inputs, multi-turn memory, or RAG over large corpora, all three models require chunking or an alternative. Teams building long-context applications should instead evaluate Llama 3.1 70B, Qwen 2.5 72B, or Phi-3 Medium before committing to Gemma 2. The 2B model targets edge and on-device inference. On classification and named-entity extraction it performs comparably to Llama 3.2 3B and IBM Granite 2B, at minimal compute cost. The economics typically favor self-hosting on cheap GPU or on-device deployment over hosted API usage at this scale. The 9B is the most practically deployed tier. It benchmarks closely with Llama 3.1 8B and Mistral 7B v0.3 on standard evals, and Groq hosts it with some of the lowest latency numbers available at sub-10B scale. Pricing across providers is competitive for high-throughput classification and short-form generation workloads. The 27B is the quality ceiling of the family, distilled from Gemini Ultra and posting competitive MT-Bench and MMLU scores for its parameter class. The 8K context remains the hard ceiling, but for single-turn instruction-following within that window it delivers reliable results. Pick the 2B for edge deployment or on-device inference. Pick the 9B for high-throughput hosted workloads that fit within 8K context. Pick the 27B when quality within short-context tasks is the priority.
Compare two at a time
Frequently asked questions
How does Gemma 2 27B IT compare to Gemma 2 2B IT and Gemma 2 9B IT on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Gemma 2 27B IT, Gemma 2 2B IT, or Gemma 2 9B IT?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Gemma 2 27B IT, Gemma 2 2B IT, and Gemma 2 9B IT?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details