Qwen3 235B A22B Thinking 2507

Model Library/Qwen3 235B A22B Thinking 2507

LLM

Reasoning

Qwen3 thinking model

On-Demand Dedicated 8xH200

Details

Modalities

text

Version

2507

Recommended Hardware

8xH200

Estimated Price

Provider

Alibaba

Family

Qwen3

Parameters

235B

Context

256000 tokens

License

Apache 2.0

Qwen3 235B A22B Thinking 2507: Advanced Reasoning Language Model

Qwen3 235B A22B Thinking 2507 is a mixture-of-experts (MoE) language model specifically designed for extended reasoning tasks. With 235 billion total parameters and 22 billion activated parameters per token, this model represents Alibaba's approach to transparent reasoning processes in large language models.

Architecture and Thinking Design

The model employs a distinctive architecture featuring 94 layers with 128 total experts, activating 8 experts per token. A defining characteristic is its mandatory thinking mode: the model automatically includes reasoning tokens in all outputs through an enforced <think> tag in the chat template. This design makes the model's internal reasoning process visible, enabling users to understand how conclusions are reached.

The architecture incorporates group query attention with 64 query heads and 4 key-value heads, optimizing the balance between computational efficiency and reasoning capability. The model natively supports a context length of 262,144 tokens, expandable to 1 million tokens with specialized configuration.

Long-Context Processing

Qwen3 235B Thinking implements dual chunk attention and MInference sparse attention mechanisms for efficient processing of ultra-long sequences. These optimizations deliver up to a 3× speedup compared to standard attention implementations, making extended reasoning over large documents practical for production environments.

Performance Benchmarks

The model achieves state-of-the-art results among open-source thinking models across multiple reasoning domains:

Mathematics: 92.3% on AIME25
Scientific Reasoning: 83.9% on HMMT25
Code Generation: 74.1% on LiveCodeBench
Academic Knowledge: 84.4% on MMLU-Pro

These results reflect the model's particular strength in tasks requiring multi-step reasoning and complex problem-solving.

Multi-modal Agent Capabilities

Beyond pure reasoning, the model features enhanced tool-calling functionality optimized for agentic workflows. Integration with the Qwen-Agent framework enables the model to function as an orchestration layer in multi-step agent applications, coordinating external tools and reasoning about action sequences.

Multilingual Support

The model demonstrates improved instruction-following and alignment capabilities across 81 languages, making it suitable for global deployment scenarios requiring consistent reasoning quality across linguistic boundaries.

Use Cases

The model excels in applications requiring transparent reasoning processes:

Mathematical problem-solving with step-by-step explanations
Scientific research assistance requiring logical inference
Code generation with reasoning about implementation choices
Multi-step planning in agentic systems
Complex decision-making requiring auditable reasoning chains
Educational applications where understanding the reasoning process is valuable
Research tasks requiring long-context analysis

Technical Considerations

The model's thinking mode is mandatory and cannot be disabled. All outputs incorporate visible reasoning tokens, which increases token consumption compared to traditional language models. Applications should account for this characteristic when designing user experiences and managing computational costs.