Qwen3.5 35B A3B

Model Library/Qwen3.5 35B A3B

LLM

Reasoning

Vision Language

MoE

Efficient 35B MoE with 3B active params, unified vision-language reasoning

On-Demand Dedicated 1xRTX PRO 6000 S

Details

Modalities

text, vision

Version

3.5 35B-A3B

Recommended Hardware

1xRTX PRO 6000 S

Estimated Price

Provider

Alibaba

Family

Qwen

Parameters

35B

Context

262144 tokens

License

apache-2.0

Qwen3.5 35B A3B: Unified Vision-Language MoE Reasoning Model

Qwen3.5 35B A3B is a multimodal mixture-of-experts foundation model from Alibaba's Qwen team, featuring a hybrid Gated DeltaNet and sparse MoE architecture. It has 35 billion total parameters with 3 billion activated per token, delivering high-throughput inference at minimal latency. The model was trained with early fusion on multimodal tokens to achieve native vision-language understanding alongside strong text reasoning, coding, and agentic capabilities.

Key Features

Unified Vision-Language Foundation - Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks
Efficient Hybrid Architecture - Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead
Scalable RL Generalization - Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability
Global Linguistic Coverage - Expanded support to 201 languages and dialects for inclusive worldwide deployment
Long Context - 262,144 tokens natively, extensible up to 1,010,000 tokens with YaRN

Architecture

Causal Language Model with Vision Encoder
35B total parameters, 3B activated per token
40 layers with a 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)) hybrid layout
Mixture of Experts with 256 experts, 8 routed + 1 shared activated
Multi-token prediction (MTP) trained with multi-steps
Native 262K context, extensible to 1M tokens

Use Cases

Multimodal reasoning and visual question answering
Document, chart, and diagram understanding
Coding and software engineering agents
Tool-using agent workflows across long horizons
Multilingual chat and instruction following across 201 languages
Long-context analysis and retrieval over large document sets

Benchmarks

On the Qwen3.5 benchmark suite (source), Qwen3.5 35B A3B scores MMLU-Pro 85.3, MMLU-Redux 93.3, C-Eval 90.2, SuperGPQA 63.4, IFEval 91.9, GPQA Diamond 84.2, and LongBench v2 59.0, placing it competitively with much larger MoE peers while activating only 3B parameters per token.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.

Related Models

textvision

Qwen3.5 397B A17B

Efficient multimodal reasoning model with hybrid DeltaNet-attention architecture

text

Qwen3 235B A22B Thinking 2507

Qwen3 thinking model

text

Qwen3 Coder 480B A35B Instruct

Qwen3 coding model