0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Llama 3.1 405B Instruct
vs
Llama 3.2 11B Vision Instruct
vs
Llama 3.2 90B Vision Instruct
Llama 3.1 405B InstructA

Llama 3.1 405B Instruct

405B params · 131K context · llama-3

Cheapest providerdeepinfra
$/1M input$2700000.00
$/1M output$8000000.00
Llama 3.2 11B Vision InstructB

Llama 3.2 11B Vision Instruct

11B params · 131K context · llama-3

Cheapest provider
$/1M input
$/1M output
Llama 3.2 90B Vision InstructC

Llama 3.2 90B Vision Instruct

90B params · 131K context · llama-3

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecLlama 3.1 405B InstructLlama 3.2 11B Vision InstructLlama 3.2 90B Vision Instruct
Parameters405B11B90B
Context window131K tokens131K tokens131K tokens
Licensellama-3llama-3llama-3
Released2024-07-232024-09-252024-09-25
Cheapest provider
Providerdeepinfra
Input / 1M tokens$2700000.00
Output / 1M tokens$8000000.00
Benchmark comparison

No benchmark data available yet.

Editor's take
Llama 3.1 405B Instruct, Llama 3.2 11B Vision Instruct, and Llama 3.2 90B Vision Instruct are all Meta open-weights models under the Llama 3 community license, but they target substantially different use cases. The 405B is a dense text-only model at Meta's open-weights capability ceiling; the 11B and 90B Vision models are the September 2024 multimodal pair, built to handle both image and text inputs with a 131K context window. Llama 3.2 11B Vision is the cost-efficient multimodal option. It shares the same vision encoder architecture as the 90B but runs at 11B parameters, making it meaningfully cheaper per token and per GPU-hour. For image classification, lightweight OCR pipelines, and document-layout understanding where budget outweighs peak accuracy, this is the model to benchmark first. Quality is commensurate with its scale relative to the 90B. Llama 3.2 90B Vision is the higher-fidelity choice for visual tasks. On ChartQA and DocVQA benchmarks it approaches or matches proprietary mid-tier VLMs. If your pipeline processes image-rich documents, complex charts, or mixed text-and-visual inputs where accuracy is visible to users, the 90B is worth the cost premium over the 11B. Llama 3.1 405B has no vision capability but reaches further on complex reasoning, extended code generation, and long-form text analysis tasks than either Vision model. It is the pick when the task is entirely text-based and 70B-class models visibly fall short. Multi-GPU hosting requirements and thinner provider availability make it a specialized choice. Pick 11B Vision for cost-efficient image pipelines. Pick 90B Vision for accuracy-sensitive visual understanding tasks. Pick 405B when the work is text-only and task complexity genuinely justifies the largest available open-weights model.
Compare two at a time
Frequently asked questions
How does Llama 3.1 405B Instruct compare to Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Llama 3.1 405B Instruct, Llama 3.2 11B Vision Instruct, or Llama 3.2 90B Vision Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Llama 3.1 405B Instruct, Llama 3.2 11B Vision Instruct, and Llama 3.2 90B Vision Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details