Gemma 4 31B IT: Dense Vision-Language Model
Gemma 4 is Google DeepMind's next-generation family of open multimodal models. The 31B variant is the dense flagship of the family, built to deliver frontier-level reasoning, coding, and multimodal understanding on consumer GPUs and workstations. It natively handles text and image input, supports a 256K context window, and covers 140+ languages.
Key Features
- Dense 31B Architecture - 30.7B-parameter dense transformer targeting the highest-quality end of the Gemma 4 family.
- Hybrid Attention - Interleaves sliding window (local) and full global attention layers, with unified Keys and Values on global layers and Proportional RoPE (p-RoPE) for efficient long-context processing.
- Reasoning / Thinking Mode - Built-in configurable thinking mode lets the model reason step-by-step before answering.
- Multimodal - Native text and image understanding with variable aspect ratio and resolution support; video analysis via frame sequences.
- Function Calling - Native structured tool use with a custom tool-call protocol for agentic workflows.
- Long Context - 256K token context window for document analysis, long-form reasoning, and agent trajectories.
- Multilingual - Out-of-the-box support for 35+ languages, pre-trained on 140+.
- Native System Prompts - First-class support for the system role.
Use Cases
- Document and PDF parsing, OCR (including multilingual and handwriting)
- Chart, diagram, and screen/UI understanding
- Long-context reasoning and summarization
- Code generation, completion, and correction
- Agentic workflows with structured function calling
- Visual question answering and image analysis
- Multilingual chat and translation
Architecture
Gemma 4 31B IT is a 60-layer dense transformer with a 1024-token sliding window on local attention layers and unified Keys/Values on global layers, paired with a ~550M parameter vision encoder. The final layer is always global, ensuring deep awareness for long-context tasks while local layers keep the memory footprint manageable.
Benchmarks
Instruction-tuned results reported by Google DeepMind (selected):
- MMLU Pro: 85.2%
- AIME 2026 (no tools): 89.2%
- LiveCodeBench v6: 80.0%
- Codeforces ELO: 2150
- GPQA Diamond: 84.3%
- Tau2 (average over 3): 76.9%
- HLE (no tools): 19.5%
- HLE (with search): 26.5%
- BigBench Extra Hard: 74.4%
- MMMLU: 88.4%
- MMMU Pro (vision): 76.9%
- MATH-Vision: 85.6%
- MedXPertQA MM: 61.3%
- MRCR v2 8-needle 128K: 66.4%
For full benchmark tables and model family comparisons, see the model card on HuggingFace.