Model Library/Gemma 4 31B IT

google logoGemma 4 31B IT

LLM
Vision Language
Chat
Reasoning

Gemma 4 31B dense vision-language model by Google with 256K context and thinking mode

On-Demand Dedicated 1xH200

Details

Modalities

text, vision

Recommended Hardware

1xH200

Estimated Price

Loading...

Provider

google

Family

gemma

Parameters

31B

Context

262144 tokens

License

apache-2.0

Gemma 4 31B IT: Dense Vision-Language Model

Gemma 4 is Google DeepMind's next-generation family of open multimodal models. The 31B variant is the dense flagship of the family, built to deliver frontier-level reasoning, coding, and multimodal understanding on consumer GPUs and workstations. It natively handles text and image input, supports a 256K context window, and covers 140+ languages.

Key Features

  • Dense 31B Architecture - 30.7B-parameter dense transformer targeting the highest-quality end of the Gemma 4 family.
  • Hybrid Attention - Interleaves sliding window (local) and full global attention layers, with unified Keys and Values on global layers and Proportional RoPE (p-RoPE) for efficient long-context processing.
  • Reasoning / Thinking Mode - Built-in configurable thinking mode lets the model reason step-by-step before answering.
  • Multimodal - Native text and image understanding with variable aspect ratio and resolution support; video analysis via frame sequences.
  • Function Calling - Native structured tool use with a custom tool-call protocol for agentic workflows.
  • Long Context - 256K token context window for document analysis, long-form reasoning, and agent trajectories.
  • Multilingual - Out-of-the-box support for 35+ languages, pre-trained on 140+.
  • Native System Prompts - First-class support for the system role.

Use Cases

  • Document and PDF parsing, OCR (including multilingual and handwriting)
  • Chart, diagram, and screen/UI understanding
  • Long-context reasoning and summarization
  • Code generation, completion, and correction
  • Agentic workflows with structured function calling
  • Visual question answering and image analysis
  • Multilingual chat and translation

Architecture

Gemma 4 31B IT is a 60-layer dense transformer with a 1024-token sliding window on local attention layers and unified Keys/Values on global layers, paired with a ~550M parameter vision encoder. The final layer is always global, ensuring deep awareness for long-context tasks while local layers keep the memory footprint manageable.

Benchmarks

Instruction-tuned results reported by Google DeepMind (selected):

  • MMLU Pro: 85.2%
  • AIME 2026 (no tools): 89.2%
  • LiveCodeBench v6: 80.0%
  • Codeforces ELO: 2150
  • GPQA Diamond: 84.3%
  • Tau2 (average over 3): 76.9%
  • HLE (no tools): 19.5%
  • HLE (with search): 26.5%
  • BigBench Extra Hard: 74.4%
  • MMMLU: 88.4%
  • MMMU Pro (vision): 76.9%
  • MATH-Vision: 85.6%
  • MedXPertQA MM: 61.3%
  • MRCR v2 8-needle 128K: 66.4%

For full benchmark tables and model family comparisons, see the model card on HuggingFace.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.