Model Library/Qwen3.5 35B A3B

Alibaba logoQwen3.5 35B A3B

LLM
Reasoning
Vision Language
MoE

Efficient 35B MoE with 3B active params, unified vision-language reasoning

On-Demand Dedicated 1xRTX PRO 6000 S

Details

Modalities

text, vision

Version

3.5 35B-A3B

Recommended Hardware

1xRTX PRO 6000 S

Estimated Price

Loading...

Provider

Alibaba

Family

Qwen

Parameters

35B

Context

262144 tokens

License

apache-2.0

Qwen3.5 35B A3B: Unified Vision-Language MoE Reasoning Model

Qwen3.5 35B A3B is a multimodal mixture-of-experts foundation model from Alibaba's Qwen team, featuring a hybrid Gated DeltaNet and sparse MoE architecture. It has 35 billion total parameters with 3 billion activated per token, delivering high-throughput inference at minimal latency. The model was trained with early fusion on multimodal tokens to achieve native vision-language understanding alongside strong text reasoning, coding, and agentic capabilities.

Key Features

  • Unified Vision-Language Foundation - Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks
  • Efficient Hybrid Architecture - Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead
  • Scalable RL Generalization - Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability
  • Global Linguistic Coverage - Expanded support to 201 languages and dialects for inclusive worldwide deployment
  • Long Context - 262,144 tokens natively, extensible up to 1,010,000 tokens with YaRN

Architecture

  • Causal Language Model with Vision Encoder
  • 35B total parameters, 3B activated per token
  • 40 layers with a 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)) hybrid layout
  • Mixture of Experts with 256 experts, 8 routed + 1 shared activated
  • Multi-token prediction (MTP) trained with multi-steps
  • Native 262K context, extensible to 1M tokens

Use Cases

  • Multimodal reasoning and visual question answering
  • Document, chart, and diagram understanding
  • Coding and software engineering agents
  • Tool-using agent workflows across long horizons
  • Multilingual chat and instruction following across 201 languages
  • Long-context analysis and retrieval over large document sets

Benchmarks

On the Qwen3.5 benchmark suite (source), Qwen3.5 35B A3B scores MMLU-Pro 85.3, MMLU-Redux 93.3, C-Eval 90.2, SuperGPQA 63.4, IFEval 91.9, GPQA Diamond 84.2, and LongBench v2 59.0, placing it competitively with much larger MoE peers while activating only 3B parameters per token.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.