0 providers0 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Gemma 2 2b It
vs
Llama 3.2 1b Instruct
vs
Llama 3.2 3b Instruct
Gemma 2 2b ItA

Gemma 2 2b It

Cheapest provider
$/1M input
$/1M output
Llama 3.2 1b InstructB

Llama 3.2 1b Instruct

Cheapest provider
$/1M input
$/1M output
Llama 3.2 3b InstructC

Llama 3.2 3b Instruct

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecGemma 2 2b ItLlama 3.2 1b InstructLlama 3.2 3b Instruct
Parameters
Context window
License
Released
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
Gemma 2 2B IT, Llama 3.2 1B Instruct, and Llama 3.2 3B Instruct are all designed primarily for on-device and edge inference. All three were released in 2024, and all three are priced at the absolute floor of hosted inference — but quality and context trade-offs are worth understanding before choosing between them. Llama 3.2 1B Instruct is Meta's smallest model, released September 2024 under the Llama 3 community license. At 1 billion parameters, it is targeted squarely at phones and edge hardware where the 3B model exceeds memory budgets. The quality ceiling is real: on most instruction-following and summarization tasks it falls noticeably behind the 3B variant. Its primary use cases are latency testing at the smallest weight class, triage pipelines where a fast cheap initial filter reduces traffic to a larger model, or as an on-device inference baseline. Llama 3.2 3B Instruct, also from Meta and released September 2024, is a more practical choice for actual application tasks. It handles classification, short-form summarization, and content moderation routing acceptably. The 131K context window makes it useful for classifying over long inputs even though generation quality at 3B is modest. Sub-$0.10 per million tokens on several platforms as of early 2026. Gemma 2 2B IT from Google DeepMind is a 2-billion-parameter model released July 2024 under the Gemma license. On structured tasks like classification and named-entity extraction it runs comparably to Llama 3.2 3B, and the 8K context, while shorter, covers most edge inference scenarios. Self-hosting on cheap GPU or on-device deployment is where the economics favor it over hosted API calls. Pick Llama 3.2 1B for true memory-constrained edge hardware. Pick Llama 3.2 3B for the best quality-per-token ratio in this parameter class with a long context window. Pick Gemma 2 2B when on-device self-hosting in a Google ecosystem is preferred and the Gemma license terms are acceptable.
Compare two at a time
Frequently asked questions
How does Gemma 2 2b It compare to Llama 3.2 1b Instruct and Llama 3.2 3b Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Gemma 2 2b It, Llama 3.2 1b Instruct, or Llama 3.2 3b Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Gemma 2 2b It, Llama 3.2 1b Instruct, and Llama 3.2 3b Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details