Model Library/Qwen3 235B A22B Thinking 2507

Alibaba logoQwen3 235B A22B Thinking 2507

LLM
Reasoning

Qwen3 thinking model

On-Demand Dedicated 8xH200

Details

Modalities

text

Version

2507

Recommended Hardware

8xH200

Estimated Price

Loading...

Provider

Alibaba

Family

Qwen3

Parameters

235B

Context

256000 tokens

License

Apache 2.0

Qwen3 235B A22B Thinking 2507: Advanced Reasoning Language Model

Qwen3 235B A22B Thinking 2507 is a mixture-of-experts (MoE) language model specifically designed for extended reasoning tasks. With 235 billion total parameters and 22 billion activated parameters per token, this model represents Alibaba's approach to transparent reasoning processes in large language models.

Architecture and Thinking Design

The model employs a distinctive architecture featuring 94 layers with 128 total experts, activating 8 experts per token. A defining characteristic is its mandatory thinking mode: the model automatically includes reasoning tokens in all outputs through an enforced <think> tag in the chat template. This design makes the model's internal reasoning process visible, enabling users to understand how conclusions are reached.

The architecture incorporates group query attention with 64 query heads and 4 key-value heads, optimizing the balance between computational efficiency and reasoning capability. The model natively supports a context length of 262,144 tokens, expandable to 1 million tokens with specialized configuration.

Long-Context Processing

Qwen3 235B Thinking implements dual chunk attention and MInference sparse attention mechanisms for efficient processing of ultra-long sequences. These optimizations deliver up to a 3× speedup compared to standard attention implementations, making extended reasoning over large documents practical for production environments.

Performance Benchmarks

The model achieves state-of-the-art results among open-source thinking models across multiple reasoning domains:

  • Mathematics: 92.3% on AIME25
  • Scientific Reasoning: 83.9% on HMMT25
  • Code Generation: 74.1% on LiveCodeBench
  • Academic Knowledge: 84.4% on MMLU-Pro

These results reflect the model's particular strength in tasks requiring multi-step reasoning and complex problem-solving.

Multi-modal Agent Capabilities

Beyond pure reasoning, the model features enhanced tool-calling functionality optimized for agentic workflows. Integration with the Qwen-Agent framework enables the model to function as an orchestration layer in multi-step agent applications, coordinating external tools and reasoning about action sequences.

Multilingual Support

The model demonstrates improved instruction-following and alignment capabilities across 81 languages, making it suitable for global deployment scenarios requiring consistent reasoning quality across linguistic boundaries.

Use Cases

The model excels in applications requiring transparent reasoning processes:

  • Mathematical problem-solving with step-by-step explanations
  • Scientific research assistance requiring logical inference
  • Code generation with reasoning about implementation choices
  • Multi-step planning in agentic systems
  • Complex decision-making requiring auditable reasoning chains
  • Educational applications where understanding the reasoning process is valuable
  • Research tasks requiring long-context analysis

Technical Considerations

The model's thinking mode is mandatory and cannot be disabled. All outputs incorporate visible reasoning tokens, which increases token consumption compared to traditional language models. Applications should account for this characteristic when designing user experiences and managing computational costs.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.

Vast AI

© 2025 Vast.ai. All rights reserved.

Vast.ai