Qwen3 thinking model
text
2507
8xH200
Loading...
Alibaba
Qwen3
235B
256000 tokens
Apache 2.0
Qwen3 235B A22B Thinking 2507 is a mixture-of-experts (MoE) language model specifically designed for extended reasoning tasks. With 235 billion total parameters and 22 billion activated parameters per token, this model represents Alibaba's approach to transparent reasoning processes in large language models.
The model employs a distinctive architecture featuring 94 layers with 128 total experts, activating 8 experts per token. A defining characteristic is its mandatory thinking mode: the model automatically includes reasoning tokens in all outputs through an enforced <think> tag in the chat template. This design makes the model's internal reasoning process visible, enabling users to understand how conclusions are reached.
The architecture incorporates group query attention with 64 query heads and 4 key-value heads, optimizing the balance between computational efficiency and reasoning capability. The model natively supports a context length of 262,144 tokens, expandable to 1 million tokens with specialized configuration.
Qwen3 235B Thinking implements dual chunk attention and MInference sparse attention mechanisms for efficient processing of ultra-long sequences. These optimizations deliver up to a 3× speedup compared to standard attention implementations, making extended reasoning over large documents practical for production environments.
The model achieves state-of-the-art results among open-source thinking models across multiple reasoning domains:
These results reflect the model's particular strength in tasks requiring multi-step reasoning and complex problem-solving.
Beyond pure reasoning, the model features enhanced tool-calling functionality optimized for agentic workflows. Integration with the Qwen-Agent framework enables the model to function as an orchestration layer in multi-step agent applications, coordinating external tools and reasoning about action sequences.
The model demonstrates improved instruction-following and alignment capabilities across 81 languages, making it suitable for global deployment scenarios requiring consistent reasoning quality across linguistic boundaries.
The model excels in applications requiring transparent reasoning processes:
The model's thinking mode is mandatory and cannot be disabled. All outputs incorporate visible reasoning tokens, which increases token consumption compared to traditional language models. Applications should account for this characteristic when designing user experiences and managing computational costs.
Choose a model and click 'Deploy' above to find available GPUs recommended for this model.
Rent your dedicated instance preconfigured with the model you've selected.
Start sending requests to your model instance and getting responses right now.