Qwen3.5 397B A17B: Efficient Multimodal Reasoning with Hybrid Attention
Qwen3.5 397B A17B is a multimodal mixture-of-experts language model developed by the Qwen team, featuring a novel hybrid architecture that combines Gated Delta Networks with sparse Gated Attention. It supports text, image, and video inputs with native reasoning capabilities across 201 languages and dialects.
Key Features
- Hybrid DeltaNet-Attention Architecture - Alternating blocks of Gated Delta Networks (linear attention) and Gated Attention with grouped-query heads, enabling efficient long-context processing while maintaining strong attention quality
- Sparse Mixture-of-Experts - 512 total experts with 10 routed and 1 shared expert active per token, delivering high capacity with efficient inference
- Native Multimodal Support - Early fusion training enables unified processing of text, images, and video inputs with near-parity to text-only performance
- Interleaved Thinking - Default reasoning mode generates structured thinking traces before responses, with per-turn control for balancing accuracy against latency
- Tool Use and Agentic Workflows - Native support for function calling and multi-step agent-based task execution
- Multilingual Coverage - Supports 201 languages and dialects with strong performance across diverse linguistic contexts
Benchmark Performance
Reasoning and Mathematics:
- AIME 2026: 91.3%
- HMMT Feb 2025: 94.8%
- GPQA Diamond: 88.4%
Knowledge and Instruction:
- MMLU-Pro: 87.8
- SuperGPQA: 70.4
- C-Eval: 93.0
- IFBench: 76.5
Coding and Software Engineering:
- SWE-bench Verified: 76.4%
- LiveCodeBench v6: 83.6
- SecCodeBench: 68.3
Tool Use and Agent Tasks:
- BFCL-V4: 72.9
- TAU2-Bench: 86.7
- Tool-Decathlon: 38.3
Vision and Multimodal:
- MMMU: 85.0
- MathVision: 88.6
- OmniDocBench: 90.8
- OCRBench: 93.1
- VideoMME (with subtitles): 87.5
Use Cases
- Complex mathematical reasoning and competition-level problem solving
- Multi-turn agentic workflows with tool calling and structured reasoning
- Code generation, debugging, and real-world software engineering tasks
- Document analysis, OCR, and visual question answering
- Video understanding and temporal reasoning
- Multilingual applications spanning 201 languages
- Multi-step research tasks requiring tool integration
- Image-based reasoning and spatial understanding
Architecture
Qwen3.5 397B A17B employs a 60-layer hybrid architecture organized in a repeating pattern of 15 cycles. Each cycle consists of three Gated DeltaNet blocks followed by one Gated Attention block, with every block paired with a mixture-of-experts feed-forward layer.
Gated Delta Networks provide efficient linear attention with fixed-size recurrent state, enabling long-context processing without the quadratic memory cost of standard attention. The interleaved Gated Attention blocks use grouped-query attention with 32 query heads and 2 key-value heads, preserving the model's ability to perform precise token-level attention when needed.
The mixture-of-experts layer routes each token through 10 of 512 available experts plus 1 shared expert, enabling the model to maintain high total capacity while keeping per-token computation efficient. Multi-Token Prediction training enables speculative decoding for faster inference throughput.
Training Approach
Qwen3.5 397B A17B was trained with early fusion multimodal pre-training, achieving near-complete training efficiency parity between multimodal and text-only settings. Post-training employs scalable reinforcement learning frameworks supporting massive-scale agent scaffolds with progressive task complexity, enabling strong performance on agentic and tool-use benchmarks.
Deploy Qwen3.5 397B A17B on Vast.ai for access to frontier-level multimodal reasoning, coding, and agentic capabilities with flexible GPU infrastructure.