Model Library/Llama 4 Scout 17B 16E Instruct

Meta-Llama logoLlama 4 Scout 17B 16E Instruct

LLM
Reasoning
Vision Language

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences

On-Demand Dedicated 8xH200

Details

Modalities

text, vision

Version

0000

Recommended Hardware

8xH200

Estimated Price

Loading...

Provider

Meta-Llama

Family

Scout

Parameters

109B

Context

10000000 tokens

License

MIT

Llama 4 Scout 17B 16E Instruct: Efficient Multimodal Intelligence

Llama 4 Scout represents Meta's efficiency-focused entry in the Llama 4 series, combining natively multimodal capabilities with practical deployability. Released in April 2025, this model employs a mixture-of-experts (MoE) architecture with 17 billion activated parameters distributed across 16 experts, totaling 109 billion parameters. Scout achieves competitive performance while maintaining substantially lower computational requirements than its larger sibling, Maverick.

Architecture and Efficiency Design

The model leverages early fusion for native multimodality within its MoE architecture, enabling integrated text-image understanding without separate encoding pipelines. A defining characteristic of Scout is its deployment efficiency: the model can fit within a single H100 GPU using on-the-fly int4 quantization, making it significantly more accessible for production environments.

Trained on approximately 40 trillion tokens from publicly available sources, licensed datasets, and Meta products/services data, Scout incorporates knowledge through August 2024. Training consumed 5.0 million GPU hours on H100-80GB hardware, with releases available in BF16 format and int4 quantization support. The model supports a 10 million token context window and can process up to 5 input images simultaneously.

Multilingual Capabilities

Scout provides comprehensive multilingual support across 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. This enables consistent performance across diverse linguistic contexts while maintaining the model's efficiency advantages.

Performance Benchmarks

Scout demonstrates competitive results across multiple evaluation domains:

Pre-trained Model Performance:

  • General Knowledge: 79.6 on MMLU (comparable to Llama 3.1 70B at 79.3)
  • Mathematical Reasoning: 50.3 on MATH
  • Code Generation: 67.8 on MBPP (pass rate)
  • Chart Interpretation: 83.4 accuracy on ChartQA

Instruction-Tuned Performance:

  • Advanced Reasoning: 74.3 accuracy on MMLU Pro
  • Expert-Level Science: 57.2 accuracy on GPQA Diamond
  • Document Understanding: 94.4 ANLS on DocVQA

These results reflect Scout's balance between performance and computational efficiency, making it suitable for applications where resource constraints matter.

Use Cases

The model excels in applications requiring multimodal understanding with deployment efficiency:

  • Assistant-like conversational experiences combining text and visual context
  • Visual reasoning and logical inference from images
  • Document analysis and information extraction from visual materials
  • Chart and diagram interpretation for data analysis
  • Code generation with multilingual support
  • Production deployments requiring efficient resource utilization

Safety and Safeguards

Meta implements comprehensive risk mitigation through three approaches: fine-tuning that emphasizes natural refusal tones while reducing false rejections, system-level protections including Llama Guard, Prompt Guard, and Code Shield, and extensive red teaming focused on CBRNE proliferation, child safety, and cyber attack enablement. The model's system prompt emphasizes conversational tone while avoiding preachy or templated language patterns.

Technical Integration

Scout integrates with the transformers library (version 4.51.0+) using flex_attention for optimal performance. The model's implementation demonstrates straightforward integration into existing workflows, with support for both standard BF16 inference and efficient int4 quantization for resource-constrained environments.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.

Vast AI

© 2025 Vast.ai. All rights reserved.

Vast.ai