Efficient multimodal reasoning model with hybrid DeltaNet-attention architecture
text, vision
3.5 397B-A17B
8xH200
Loading...
Alibaba
Qwen
397B
262144 tokens
Apache-2.0
Qwen3.5 397B A17B is a multimodal mixture-of-experts language model developed by the Qwen team, featuring a novel hybrid architecture that combines Gated Delta Networks with sparse Gated Attention. It supports text, image, and video inputs with native reasoning capabilities across 201 languages and dialects.
Reasoning and Mathematics:
Knowledge and Instruction:
Coding and Software Engineering:
Tool Use and Agent Tasks:
Vision and Multimodal:
Qwen3.5 397B A17B employs a 60-layer hybrid architecture organized in a repeating pattern of 15 cycles. Each cycle consists of three Gated DeltaNet blocks followed by one Gated Attention block, with every block paired with a mixture-of-experts feed-forward layer.
Gated Delta Networks provide efficient linear attention with fixed-size recurrent state, enabling long-context processing without the quadratic memory cost of standard attention. The interleaved Gated Attention blocks use grouped-query attention with 32 query heads and 2 key-value heads, preserving the model's ability to perform precise token-level attention when needed.
The mixture-of-experts layer routes each token through 10 of 512 available experts plus 1 shared expert, enabling the model to maintain high total capacity while keeping per-token computation efficient. Multi-Token Prediction training enables speculative decoding for faster inference throughput.
Qwen3.5 397B A17B was trained with early fusion multimodal pre-training, achieving near-complete training efficiency parity between multimodal and text-only settings. Post-training employs scalable reinforcement learning frameworks supporting massive-scale agent scaffolds with progressive task complexity, enabling strong performance on agentic and tool-use benchmarks.
Deploy Qwen3.5 397B A17B on Vast.ai for access to frontier-level multimodal reasoning, coding, and agentic capabilities with flexible GPU infrastructure.
Choose a model and click 'Deploy' above to find available GPUs recommended for this model.
Rent your dedicated instance preconfigured with the model you've selected.
Start sending requests to your model instance and getting responses right now.

© 2026 Vast.ai. All rights reserved.