Model Library/Gemma 4 26B A4B IT

google logoGemma 4 26B A4B IT

LLM
Vision Language
Chat
MoE
Reasoning

Gemma 4 26B A4B MoE vision-language model by Google with 256K context and thinking mode

On-Demand Dedicated 1xH100 SXM

Details

Modalities

text, vision

Recommended Hardware

1xH100 SXM

Estimated Price

Loading...

Provider

google

Family

gemma

Parameters

26B

Context

262144 tokens

License

apache-2.0

Gemma 4 26B A4B IT: Mixture-of-Experts Vision-Language Model

Gemma 4 is Google DeepMind's next-generation family of open multimodal models. The 26B A4B variant is a Mixture-of-Experts model with 25.2B total parameters but only 3.8B active per token, delivering frontier-level quality at the inference speed of a much smaller dense model. It handles text and image input natively, supports a 256K context window, and covers 140+ languages.

Key Features

  • Mixture-of-Experts Architecture - 128 fine-grained experts with top-8 routing and a shared expert, activating only 4B parameters per token for efficient inference.
  • Hybrid Attention - Interleaves sliding window (local) and full global attention layers, with unified Keys and Values on global layers and Proportional RoPE (p-RoPE) for long context efficiency.
  • Reasoning / Thinking Mode - Built-in configurable thinking mode lets the model reason step-by-step before answering.
  • Multimodal - Native text and image understanding with variable aspect ratio and resolution support; video analysis via frame sequences.
  • Function Calling - Native structured tool use for agentic workflows.
  • Long Context - 256K token context window for document analysis, long-form reasoning, and agent trajectories.
  • Multilingual - Out-of-the-box support for 35+ languages, pre-trained on 140+.
  • Native System Prompts - First-class support for the system role.

Use Cases

  • Document and PDF parsing, OCR (including multilingual and handwriting)
  • Chart, diagram, and screen/UI understanding
  • Long-context reasoning and summarization
  • Code generation, completion, and correction
  • Agentic workflows with structured function calling
  • Visual question answering and image analysis
  • Multilingual chat and translation

Architecture

Gemma 4 26B A4B uses a 30-layer MoE transformer with a 1024-token sliding window on local attention layers and unified Keys/Values on global layers, paired with a ~550M parameter vision encoder. Each expert has a GELU-activated FFN and the routing selects 8 of 128 experts plus 1 shared expert per token.

Benchmarks

Instruction-tuned results reported by Google DeepMind (selected):

  • MMLU Pro: 82.6%
  • AIME 2026 (no tools): 88.3%
  • LiveCodeBench v6: 77.1%
  • GPQA Diamond: 82.3%
  • BigBench Extra Hard: 64.8%
  • MMMLU: 86.3%
  • MMMU Pro (vision): 73.8%
  • MATH-Vision: 82.4%
  • MRCR v2 8-needle 128K: 44.1%

For full benchmark tables and model family comparisons, see the model card on HuggingFace.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.