Qwen3.5 35B A3B: Unified Vision-Language MoE Reasoning Model
Qwen3.5 35B A3B is a multimodal mixture-of-experts foundation model from Alibaba's Qwen team, featuring a hybrid Gated DeltaNet and sparse MoE architecture. It has 35 billion total parameters with 3 billion activated per token, delivering high-throughput inference at minimal latency. The model was trained with early fusion on multimodal tokens to achieve native vision-language understanding alongside strong text reasoning, coding, and agentic capabilities.
Key Features
- Unified Vision-Language Foundation - Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks
- Efficient Hybrid Architecture - Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead
- Scalable RL Generalization - Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability
- Global Linguistic Coverage - Expanded support to 201 languages and dialects for inclusive worldwide deployment
- Long Context - 262,144 tokens natively, extensible up to 1,010,000 tokens with YaRN
Architecture
- Causal Language Model with Vision Encoder
- 35B total parameters, 3B activated per token
- 40 layers with a 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)) hybrid layout
- Mixture of Experts with 256 experts, 8 routed + 1 shared activated
- Multi-token prediction (MTP) trained with multi-steps
- Native 262K context, extensible to 1M tokens
Use Cases
- Multimodal reasoning and visual question answering
- Document, chart, and diagram understanding
- Coding and software engineering agents
- Tool-using agent workflows across long horizons
- Multilingual chat and instruction following across 201 languages
- Long-context analysis and retrieval over large document sets
Benchmarks
On the Qwen3.5 benchmark suite (source), Qwen3.5 35B A3B scores MMLU-Pro 85.3, MMLU-Redux 93.3, C-Eval 90.2, SuperGPQA 63.4, IFEval 91.9, GPQA Diamond 84.2, and LongBench v2 59.0, placing it competitively with much larger MoE peers while activating only 3B parameters per token.