Gemma 4 26B A4B IT: Mixture-of-Experts Vision-Language Model
Gemma 4 is Google DeepMind's next-generation family of open multimodal models. The 26B A4B variant is a Mixture-of-Experts model with 25.2B total parameters but only 3.8B active per token, delivering frontier-level quality at the inference speed of a much smaller dense model. It handles text and image input natively, supports a 256K context window, and covers 140+ languages.
Key Features
- Mixture-of-Experts Architecture - 128 fine-grained experts with top-8 routing and a shared expert, activating only 4B parameters per token for efficient inference.
- Hybrid Attention - Interleaves sliding window (local) and full global attention layers, with unified Keys and Values on global layers and Proportional RoPE (p-RoPE) for long context efficiency.
- Reasoning / Thinking Mode - Built-in configurable thinking mode lets the model reason step-by-step before answering.
- Multimodal - Native text and image understanding with variable aspect ratio and resolution support; video analysis via frame sequences.
- Function Calling - Native structured tool use for agentic workflows.
- Long Context - 256K token context window for document analysis, long-form reasoning, and agent trajectories.
- Multilingual - Out-of-the-box support for 35+ languages, pre-trained on 140+.
- Native System Prompts - First-class support for the system role.
Use Cases
- Document and PDF parsing, OCR (including multilingual and handwriting)
- Chart, diagram, and screen/UI understanding
- Long-context reasoning and summarization
- Code generation, completion, and correction
- Agentic workflows with structured function calling
- Visual question answering and image analysis
- Multilingual chat and translation
Architecture
Gemma 4 26B A4B uses a 30-layer MoE transformer with a 1024-token sliding window on local attention layers and unified Keys/Values on global layers, paired with a ~550M parameter vision encoder. Each expert has a GELU-activated FFN and the routing selects 8 of 128 experts plus 1 shared expert per token.
Benchmarks
Instruction-tuned results reported by Google DeepMind (selected):
- MMLU Pro: 82.6%
- AIME 2026 (no tools): 88.3%
- LiveCodeBench v6: 77.1%
- GPQA Diamond: 82.3%
- BigBench Extra Hard: 64.8%
- MMMLU: 86.3%
- MMMU Pro (vision): 73.8%
- MATH-Vision: 82.4%
- MRCR v2 8-needle 128K: 44.1%
For full benchmark tables and model family comparisons, see the model card on HuggingFace.