The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences
text, vision
0000
8xH200
Loading...
Meta-Llama
Maverick
400B
1000000 tokens
MIT
Llama 4 Maverick is a natively multimodal AI model featuring a mixture-of-experts (MoE) architecture with 17 billion activated parameters distributed across 128 total experts. Released by Meta in April 2025, this model represents a significant advancement in the Llama ecosystem by combining text and image understanding capabilities within a unified architecture.
The model employs an auto-regressive language architecture with mixture-of-experts and early fusion for native multimodality. This design enables seamless processing of both text and visual inputs without requiring separate encoding pipelines. The model supports an extensive 10 million token context length and can process up to 5 input images simultaneously.
Trained on approximately 22 trillion tokens from publicly available sources, licensed datasets, and Meta products/services data, the model incorporates knowledge through August 2024. Training consumed 2.38 million GPU hours on H100-80GB hardware, with releases available in both BF16 and FP8 quantization formats.
The model provides comprehensive multilingual support across 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. This enables deployment in diverse global contexts while maintaining consistent performance across linguistic boundaries.
Llama 4 Maverick demonstrates strong results across multiple evaluation domains:
These results reflect the model's versatility in handling both traditional language tasks and advanced visual reasoning challenges.
The model excels in applications requiring multimodal understanding:
Llama 4 Maverick emphasizes improved system prompt steerability, allowing developers greater control over model behavior. The model exhibits reduced false refusals to benign queries while maintaining comprehensive safety fine-tuning. This balance enables more natural conversational tones while preserving flexibility for application-specific customization.
Choose a model and click 'Deploy' above to find available GPUs recommended for this model.
Rent your dedicated instance preconfigured with the model you've selected.
Start sending requests to your model instance and getting responses right now.