Llama 4 Maverick 17B 128E Instruct

Model Library/Llama 4 Maverick 17B 128E Instruct

LLM

Reasoning

Vision Language

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences

On-Demand Dedicated 8xH200

Details

Modalities

text, vision

Version

0000

Recommended Hardware

8xH200

Estimated Price

Provider

Meta-Llama

Family

Maverick

Parameters

400B

Context

1000000 tokens

License

MIT

Llama 4 Maverick 17B 128E Instruct: Natively Multimodal AI

Llama 4 Maverick is a natively multimodal AI model featuring a mixture-of-experts (MoE) architecture with 17 billion activated parameters distributed across 128 total experts. Released by Meta in April 2025, this model represents a significant advancement in the Llama ecosystem by combining text and image understanding capabilities within a unified architecture.

Architecture and Design

The model employs an auto-regressive language architecture with mixture-of-experts and early fusion for native multimodality. This design enables seamless processing of both text and visual inputs without requiring separate encoding pipelines. The model supports an extensive 10 million token context length and can process up to 5 input images simultaneously.

Trained on approximately 22 trillion tokens from publicly available sources, licensed datasets, and Meta products/services data, the model incorporates knowledge through August 2024. Training consumed 2.38 million GPU hours on H100-80GB hardware, with releases available in both BF16 and FP8 quantization formats.

Multilingual Capabilities

The model provides comprehensive multilingual support across 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. This enables deployment in diverse global contexts while maintaining consistent performance across linguistic boundaries.

Performance Benchmarks

Llama 4 Maverick demonstrates strong results across multiple evaluation domains:

Mathematical Reasoning: 61.2 on MATH (exact match, majority@1)
General Knowledge: 85.5 on MMLU
Code Generation: 77.6 on MBPP (pass@1)
Document Understanding: 91.6 ANLS on DocVQA
Chart Interpretation: 85.3 accuracy on ChartQA
Advanced Reasoning: 69.8 accuracy on GPQA Diamond

These results reflect the model's versatility in handling both traditional language tasks and advanced visual reasoning challenges.

Use Cases

The model excels in applications requiring multimodal understanding:

Assistant-like conversational experiences combining text and visual context
Visual reasoning and logical inference from images
Image captioning and detailed description generation
Document analysis and information extraction from visual materials
Chart and diagram interpretation for data analysis
Multilingual content understanding across supported languages

Training Philosophy

Llama 4 Maverick emphasizes improved system prompt steerability, allowing developers greater control over model behavior. The model exhibits reduced false refusals to benign queries while maintaining comprehensive safety fine-tuning. This balance enables more natural conversational tones while preserving flexibility for application-specific customization.