Qwen3.5 397B A17B

Model Library/Qwen3.5 397B A17B

LLM

Reasoning

Vision Language

MoE

Efficient multimodal reasoning model with hybrid DeltaNet-attention architecture

On-Demand Dedicated 8xH200

Details

Modalities

text, vision

Version

3.5 397B-A17B

Recommended Hardware

8xH200

Estimated Price

Provider

Alibaba

Family

Qwen

Parameters

397B

Context

262144 tokens

License

Apache-2.0

Qwen3.5 397B A17B: Efficient Multimodal Reasoning with Hybrid Attention

Qwen3.5 397B A17B is a multimodal mixture-of-experts language model developed by the Qwen team, featuring a novel hybrid architecture that combines Gated Delta Networks with sparse Gated Attention. It supports text, image, and video inputs with native reasoning capabilities across 201 languages and dialects.

Key Features

Hybrid DeltaNet-Attention Architecture - Alternating blocks of Gated Delta Networks (linear attention) and Gated Attention with grouped-query heads, enabling efficient long-context processing while maintaining strong attention quality
Sparse Mixture-of-Experts - 512 total experts with 10 routed and 1 shared expert active per token, delivering high capacity with efficient inference
Native Multimodal Support - Early fusion training enables unified processing of text, images, and video inputs with near-parity to text-only performance
Interleaved Thinking - Default reasoning mode generates structured thinking traces before responses, with per-turn control for balancing accuracy against latency
Tool Use and Agentic Workflows - Native support for function calling and multi-step agent-based task execution
Multilingual Coverage - Supports 201 languages and dialects with strong performance across diverse linguistic contexts

Benchmark Performance

Reasoning and Mathematics:

AIME 2026: 91.3%
HMMT Feb 2025: 94.8%
GPQA Diamond: 88.4%

Knowledge and Instruction:

MMLU-Pro: 87.8
SuperGPQA: 70.4
C-Eval: 93.0
IFBench: 76.5

Coding and Software Engineering:

SWE-bench Verified: 76.4%
LiveCodeBench v6: 83.6
SecCodeBench: 68.3

Tool Use and Agent Tasks:

BFCL-V4: 72.9
TAU2-Bench: 86.7
Tool-Decathlon: 38.3

Vision and Multimodal:

MMMU: 85.0
MathVision: 88.6
OmniDocBench: 90.8
OCRBench: 93.1
VideoMME (with subtitles): 87.5

Use Cases

Complex mathematical reasoning and competition-level problem solving
Multi-turn agentic workflows with tool calling and structured reasoning
Code generation, debugging, and real-world software engineering tasks
Document analysis, OCR, and visual question answering
Video understanding and temporal reasoning
Multilingual applications spanning 201 languages
Multi-step research tasks requiring tool integration
Image-based reasoning and spatial understanding

Architecture

Qwen3.5 397B A17B employs a 60-layer hybrid architecture organized in a repeating pattern of 15 cycles. Each cycle consists of three Gated DeltaNet blocks followed by one Gated Attention block, with every block paired with a mixture-of-experts feed-forward layer.

Gated Delta Networks provide efficient linear attention with fixed-size recurrent state, enabling long-context processing without the quadratic memory cost of standard attention. The interleaved Gated Attention blocks use grouped-query attention with 32 query heads and 2 key-value heads, preserving the model's ability to perform precise token-level attention when needed.

The mixture-of-experts layer routes each token through 10 of 512 available experts plus 1 shared expert, enabling the model to maintain high total capacity while keeping per-token computation efficient. Multi-Token Prediction training enables speculative decoding for faster inference throughput.

Training Approach

Qwen3.5 397B A17B was trained with early fusion multimodal pre-training, achieving near-complete training efficiency parity between multimodal and text-only settings. Post-training employs scalable reinforcement learning frameworks supporting massive-scale agent scaffolds with progressive task complexity, enabling strong performance on agentic and tool-use benchmarks.

Deploy Qwen3.5 397B A17B on Vast.ai for access to frontier-level multimodal reasoning, coding, and agentic capabilities with flexible GPU infrastructure.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.

Related Models

text

Qwen3 235B A22B Thinking 2507

Qwen3 thinking model