Qwen3.6 35B A3B

Model Library/Qwen3.6 35B A3B

LLM

Vision Language

MoE

Reasoning

Coding

Agentic coding MoE with hybrid Gated DeltaNet and vision support

On-Demand Dedicated 1xRTX PRO 6000 S

Details

Modalities

text, vision

Recommended Hardware

1xRTX PRO 6000 S

Estimated Price

Provider

Alibaba

Family

Qwen3.6

Parameters

35B

Context

262144 tokens

License

apache-2.0

Qwen3.6 35B A3B: Agentic Coding with Hybrid Gated DeltaNet

Qwen3.6 35B A3B is the first open-weight model in the Qwen3.6 series, built on direct community feedback and focused on stability and real-world utility. It combines a hybrid Gated DeltaNet and Gated Attention architecture with sparse Mixture-of-Experts routing and a vision encoder for unified multimodal reasoning.

Key Features

Agentic Coding - Handles frontend workflows and repository-level reasoning with improved fluency and precision over earlier Qwen generations
Thinking Preservation - New option to retain reasoning context from historical messages, streamlining iterative development and reducing redundant token generation
Hybrid Architecture - Alternating Gated DeltaNet and Gated Attention blocks combined with sparse MoE, balancing long-context efficiency against attention precision
Sparse Mixture-of-Experts - 256 total experts with 8 routed and 1 shared expert active per token, delivering 35B total capacity with only 3B active parameters
Multi-Token Prediction - Trained with multi-step MTP, enabling speculative decoding for lower-latency inference
Native 262K Context - Handles 262,144 tokens natively, extensible up to 1,010,000 tokens via YaRN RoPE scaling
Multimodal Inputs - Unified vision-language model supporting text, image, and video inputs
Tool Calling - Native tool-calling support with the qwen3_coder parser for agent workflows

Benchmark Performance

Coding and Software Engineering:

SWE-bench Verified: 73.4
SWE-bench Multilingual: 67.2
SWE-bench Pro: 49.5
Terminal-Bench 2.0: 51.5
LiveCodeBench v6: 80.4
NL2Repo: 29.4
QwenClawBench: 52.6

General Agent and Tool Use:

TAU3-Bench: 67.2
DeepPlanning: 25.9
MCPMark: 37.0
MCP-Atlas: 62.8
WideSearch: 60.1

Knowledge:

MMLU-Pro: 85.2
MMLU-Redux: 93.3
SuperGPQA: 64.7
C-Eval: 90.0

STEM and Reasoning:

GPQA: 86.0
HLE: 21.4
HMMT Feb 25: 90.7
HMMT Nov 25: 89.1
HMMT Feb 26: 83.6
IMOAnswerBench: 78.9
AIME26: 92.6

Use Cases

Agentic coding tasks across frontend, backend, and repository-level workflows
Multi-turn agent scenarios where preserved reasoning context improves decision consistency
Tool-calling and MCP-based automation
Competition-level mathematics and STEM reasoning
Long-context document analysis up to 262K tokens natively
Visual question answering and image-grounded reasoning
Video understanding with configurable frame sampling

Architecture

Qwen3.6 35B A3B uses a 40-layer hybrid architecture organized as ten cycles of three Gated DeltaNet blocks followed by one Gated Attention block, each paired with a sparse Mixture-of-Experts feed-forward layer.

Gated DeltaNet provides linear-attention efficiency with a fixed-size recurrent state, keeping long-context compute and memory cost tractable. The interleaved Gated Attention blocks use 16 query heads and 2 key-value heads with a 256-dimensional head and a 64-dimensional rotary position embedding, preserving precise token-level attention where it is most valuable.

The Mixture-of-Experts layer routes each token through 8 of 256 available experts plus 1 shared expert, with a 512-dimensional expert intermediate size. The model is trained with Multi-Token Prediction across multiple steps, enabling speculative decoding at inference time.

A 2048-dimensional language backbone pairs with a vision encoder to form a unified multimodal model, supporting a 248,320-token padded vocabulary and handling text, image, and video inputs through a shared representation.

Deploy Qwen3.6 35B A3B on Vast.ai with vLLM, SGLang, or llama.cpp for efficient agentic coding, long-context reasoning, and multimodal inference on flexible GPU infrastructure.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.

Related Models

textvision

Qwen3.5 397B A17B

Efficient multimodal reasoning model with hybrid DeltaNet-attention architecture

text

Qwen3 Coder 480B A35B Instruct

Qwen3 coding model

text

Qwen3 Coder Next

Ultra-efficient 80B coding agent with only 3B active parameters