The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences
text, vision
0000
8xH200
Loading...
Meta-Llama
Scout
109B
10000000 tokens
MIT
Llama 4 Scout represents Meta's efficiency-focused entry in the Llama 4 series, combining natively multimodal capabilities with practical deployability. Released in April 2025, this model employs a mixture-of-experts (MoE) architecture with 17 billion activated parameters distributed across 16 experts, totaling 109 billion parameters. Scout achieves competitive performance while maintaining substantially lower computational requirements than its larger sibling, Maverick.
The model leverages early fusion for native multimodality within its MoE architecture, enabling integrated text-image understanding without separate encoding pipelines. A defining characteristic of Scout is its deployment efficiency: the model can fit within a single H100 GPU using on-the-fly int4 quantization, making it significantly more accessible for production environments.
Trained on approximately 40 trillion tokens from publicly available sources, licensed datasets, and Meta products/services data, Scout incorporates knowledge through August 2024. Training consumed 5.0 million GPU hours on H100-80GB hardware, with releases available in BF16 format and int4 quantization support. The model supports a 10 million token context window and can process up to 5 input images simultaneously.
Scout provides comprehensive multilingual support across 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. This enables consistent performance across diverse linguistic contexts while maintaining the model's efficiency advantages.
Scout demonstrates competitive results across multiple evaluation domains:
Pre-trained Model Performance:
Instruction-Tuned Performance:
These results reflect Scout's balance between performance and computational efficiency, making it suitable for applications where resource constraints matter.
The model excels in applications requiring multimodal understanding with deployment efficiency:
Meta implements comprehensive risk mitigation through three approaches: fine-tuning that emphasizes natural refusal tones while reducing false rejections, system-level protections including Llama Guard, Prompt Guard, and Code Shield, and extensive red teaming focused on CBRNE proliferation, child safety, and cyber attack enablement. The model's system prompt emphasizes conversational tone while avoiding preachy or templated language patterns.
Scout integrates with the transformers library (version 4.51.0+) using flex_attention for optimal performance. The model's implementation demonstrates straightforward integration into existing workflows, with support for both standard BF16 inference and efficient int4 quantization for resource-constrained environments.
Choose a model and click 'Deploy' above to find available GPUs recommended for this model.
Rent your dedicated instance preconfigured with the model you've selected.
Start sending requests to your model instance and getting responses right now.