Qwen3.6 35B A3B: Agentic Coding with Hybrid Gated DeltaNet
Qwen3.6 35B A3B is the first open-weight model in the Qwen3.6 series, built on direct community feedback and focused on stability and real-world utility. It combines a hybrid Gated DeltaNet and Gated Attention architecture with sparse Mixture-of-Experts routing and a vision encoder for unified multimodal reasoning.
Key Features
- Agentic Coding - Handles frontend workflows and repository-level reasoning with improved fluency and precision over earlier Qwen generations
- Thinking Preservation - New option to retain reasoning context from historical messages, streamlining iterative development and reducing redundant token generation
- Hybrid Architecture - Alternating Gated DeltaNet and Gated Attention blocks combined with sparse MoE, balancing long-context efficiency against attention precision
- Sparse Mixture-of-Experts - 256 total experts with 8 routed and 1 shared expert active per token, delivering 35B total capacity with only 3B active parameters
- Multi-Token Prediction - Trained with multi-step MTP, enabling speculative decoding for lower-latency inference
- Native 262K Context - Handles 262,144 tokens natively, extensible up to 1,010,000 tokens via YaRN RoPE scaling
- Multimodal Inputs - Unified vision-language model supporting text, image, and video inputs
- Tool Calling - Native tool-calling support with the
qwen3_coder parser for agent workflows
Benchmark Performance
Coding and Software Engineering:
- SWE-bench Verified: 73.4
- SWE-bench Multilingual: 67.2
- SWE-bench Pro: 49.5
- Terminal-Bench 2.0: 51.5
- LiveCodeBench v6: 80.4
- NL2Repo: 29.4
- QwenClawBench: 52.6
General Agent and Tool Use:
- TAU3-Bench: 67.2
- DeepPlanning: 25.9
- MCPMark: 37.0
- MCP-Atlas: 62.8
- WideSearch: 60.1
Knowledge:
- MMLU-Pro: 85.2
- MMLU-Redux: 93.3
- SuperGPQA: 64.7
- C-Eval: 90.0
STEM and Reasoning:
- GPQA: 86.0
- HLE: 21.4
- HMMT Feb 25: 90.7
- HMMT Nov 25: 89.1
- HMMT Feb 26: 83.6
- IMOAnswerBench: 78.9
- AIME26: 92.6
Use Cases
- Agentic coding tasks across frontend, backend, and repository-level workflows
- Multi-turn agent scenarios where preserved reasoning context improves decision consistency
- Tool-calling and MCP-based automation
- Competition-level mathematics and STEM reasoning
- Long-context document analysis up to 262K tokens natively
- Visual question answering and image-grounded reasoning
- Video understanding with configurable frame sampling
Architecture
Qwen3.6 35B A3B uses a 40-layer hybrid architecture organized as ten cycles of three Gated DeltaNet blocks followed by one Gated Attention block, each paired with a sparse Mixture-of-Experts feed-forward layer.
Gated DeltaNet provides linear-attention efficiency with a fixed-size recurrent state, keeping long-context compute and memory cost tractable. The interleaved Gated Attention blocks use 16 query heads and 2 key-value heads with a 256-dimensional head and a 64-dimensional rotary position embedding, preserving precise token-level attention where it is most valuable.
The Mixture-of-Experts layer routes each token through 8 of 256 available experts plus 1 shared expert, with a 512-dimensional expert intermediate size. The model is trained with Multi-Token Prediction across multiple steps, enabling speculative decoding at inference time.
A 2048-dimensional language backbone pairs with a vision encoder to form a unified multimodal model, supporting a 248,320-token padded vocabulary and handling text, image, and video inputs through a shared representation.
Deploy Qwen3.6 35B A3B on Vast.ai with vLLM, SGLang, or llama.cpp for efficient agentic coding, long-context reasoning, and multimodal inference on flexible GPU infrastructure.