0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

OLMo 2 13B Instruct
vs
Qwen 3 14B Instruct
vs
StarCoder2 15B Instruct
OLMo 2 13B InstructA

OLMo 2 13B Instruct

13B params · 4K context · apache-2.0

Cheapest provider
$/1M input
$/1M output
Qwen 3 14B InstructB

Qwen 3 14B Instruct

14B params · 131K context · qwen

Cheapest provider
$/1M input
$/1M output
StarCoder2 15B InstructC

StarCoder2 15B Instruct

15B params · 16K context · bigcode-openrail-m

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecOLMo 2 13B InstructQwen 3 14B InstructStarCoder2 15B Instruct
Parameters13B14B15B
Context window4K tokens131K tokens🏆16K tokens
Licenseapache-2.0qwenbigcode-openrail-m
Released2024-11-212025-04-282024-09-06
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
OLMo 2 13B Instruct, Qwen 3 14B Instruct, and StarCoder2 15B Instruct are three models in the 13–15B range with distinct target audiences. Comparing them is less about who wins on a general leaderboard and more about matching model design to workload: research reproducibility, multilingual production inference, or code generation with provenance guarantees. OLMo 2 13B Instruct from Allen AI is the most openly documented model in this comparison — Apache 2.0 weights, Dolma corpus, OLMo-mix training code, and public evaluation results all published together. Released November 2024, it is the model academic teams and reproducibility-focused researchers reach for. The 4K context ceiling and thin hosted coverage are the constraints that limit its use as a production inference endpoint; it is better understood as a fine-tuning base or research reference. Qwen 3 14B Instruct from Alibaba brings a 131K context window, strong CJK multilingual handling, and latency comparable to smaller Llama and Mistral models. For production deployments — customer-facing applications, multilingual summarization, document extraction — it is the pragmatic choice in this size class. The Qwen license permits commercial use with attribution. StarCoder2 15B Instruct from BigCode is a 15-billion-parameter code model trained on The Stack v2, a dataset restricted to permissively licensed source code. The 16K context window handles most single-file and small multi-file completion tasks. On HumanEval benchmarks it has been overtaken by DeepSeek Coder V2 and Qwen 2.5 Coder, so it is not the highest raw throughput coding choice. Its differentiated value is training data provenance: teams in regulated industries with strict IP policies around model training data favor StarCoder2 because every training example carries a verified open-source license. Released under BigCode OpenRAIL-M. Pick OLMo 2 13B for academic research, reproducibility, or fine-tuning base use. Pick Qwen 3 14B for multilingual production inference with long-context capability. Pick StarCoder2 15B for code generation when verifiable training data provenance is a hard requirement.
Compare two at a time
Frequently asked questions
How does OLMo 2 13B Instruct compare to Qwen 3 14B Instruct and StarCoder2 15B Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: OLMo 2 13B Instruct, Qwen 3 14B Instruct, or StarCoder2 15B Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for OLMo 2 13B Instruct, Qwen 3 14B Instruct, and StarCoder2 15B Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details