The DeepSeek R1-0528 release introduces a significant improvement to the reasoning capabilities of language models by making it easier to access "thinking" mode without requiring complex prompt engineering or pre-pending thinking tokens. This enhanced model provides transparent, step-by-step reasoning that's particularly valuable for educational applications, complex problem solving, and scenarios where transparency in AI decision-making is crucial.
In this post, we'll explore how to deploy the DeepSeek-R1-0528-Qwen3-8B model using vLLM on Vast.ai's cloud GPU platform, leveraging the new qwen3 reasoning parser that simplifies access to the model's internal thinking process.
DeepSeek R1-0528 represents more than just a technical advancement—it's a significant step toward democratizing advanced AI capabilities. As an open source model, it makes these reasoning and advanced AI capabilities more accessible.
Unlike closed-source models that require vendor dependence, usage costs, and data privacy concerns, DeepSeek R1-0528 enables complete control over your AI infrastructure. Organizations can now deploy advanced reasoning capabilities locally, maintaining data sovereignty while avoiding variable per-token pricing.
This democratization means organizations can access cutting-edge AI without massive infrastructure investments. Advanced reasoning capabilities are no longer exclusive to large tech companies—anyone with GPU access can deploy their own reasoning-capable AI system.
The DeepSeek R1-0528 release addresses a key limitation in previous DeepSeek models: the complexity of accessing the model's internal reasoning process. Previously, users needed to manually prepend "thinking" tokens to model outputs or use complex prompt engineering to see step-by-step reasoning.
First, install the Vast CLI and configure your API key:
pip install --upgrade vastai
export VAST_API_KEY="your_vast_api_key"
vastai set api-key $VAST_API_KEY
The DeepSeek-R1-0528-Qwen3-8B model requires specific hardware capabilities:
Search for suitable instances:
vastai search offers "compute_cap >= 750 \
gpu_ram >= 24 \
num_gpus = 1 \
static_ip = true \
direct_port_count >= 1 \
verified = true \
disk_space >= 60 \
rentable = true"
Deploy using the vLLM OpenAI-compatible server with the qwen3 reasoning parser:
export INSTANCE_ID="your_instance_id"
vastai create instance $INSTANCE_ID \
--image vllm/vllm-openai:latest \
--env '-p 8000:8000' \
--disk 60 \
--args --model deepseek-ai/DeepSeek-R1-0528-Qwen3-8B \
--max-model-len 4096 \
--reasoning-parser qwen3
Key deployment parameters:
--reasoning-parser qwen3
: Enables the simplified reasoning parser--max-model-len 4096
: Accommodates longer reasoning sequences--tensor-parallel-size 1
: Single GPU configuration--gpu-memory-utilization 0.90
: Optimized memory usageNavigate to the Instances tab in the Vast AI Console and locate your instance. Click the IP address button to view the forwarded ports.
Test the connection with a simple curl command:
export VAST_IP_ADDRESS="your_ip_address"
export VAST_PORT="your_port"
curl -X POST http://$VAST_IP_ADDRESS:$VAST_PORT/v1/completions \
-H "Content-Type: application/json" \
-d '{"model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B", "prompt": "Hello, how are you?", "max_tokens": 50}'
import time
from openai import OpenAI
# Replace with your actual Vast.ai instance details
VAST_IP_ADDRESS = "your_ip_address"
VAST_PORT = "your_port"
client = OpenAI(
api_key="DUMMY",
base_url=f"http://{VAST_IP_ADDRESS}:{VAST_PORT}/v1"
)
start = time.time()
test_resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
messages=[{"role": "user", "content": "Hello"}]
)
elapsed = time.time() - start
print("Response:", test_resp.choices[0].message.content)
print(f"Elapsed time: {elapsed:.4f} s")
This is where DeepSeek R1-0528 truly shines. The model naturally provides step-by-step reasoning without complex prompt engineering:
reasoning_prompt = """
Write the History of Paris in a paragraph, show your thinking
step by step first before paragraph and then write the history
in a paragraph
"""
start = time.time()
resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
messages=[{"role": "user", "content": reasoning_prompt}]
)
elapsed = time.time() - start
print("Model output:\n", resp.choices[0].message.content)
print(f"Elapsed time: {elapsed:.3f} s")
Expected output structure:
math_prompt = "If 8 × 7 = ?, show your work step by step."
resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
messages=[{"role": "user", "content": math_prompt}]
)
print("Mathematical reasoning:\n", resp.choices[0].message.content)
The model will naturally break down the problem, show intermediate steps, and arrive at the solution with clear reasoning traces.
No need to prepend thinking tokens or use complex prompt engineering. The qwen3
reasoning parser automatically handles the extraction and presentation of reasoning steps.
Perfect for educational applications where students need to understand the problem-solving process, not just the final answer.
Users can see exactly how the model arrives at its conclusions, building trust in AI-powered decision-making systems.
Running on Vast.ai provides significant cost savings compared to major cloud providers while maintaining high performance.
OpenAI-compatible API ensures seamless integration with existing applications and workflows.
The DeepSeek R1-0528 release represents a significant step forward in making AI reasoning more accessible and transparent. By combining this powerful model with Vast.ai's cost-effective GPU infrastructure and the simplified qwen3 reasoning parser, developers can now easily deploy transparent, reasoning-capable AI systems.
Whether you're building educational tools, business intelligence systems, or research applications, DeepSeek R1-0528 on Vast.ai provides an excellent foundation for applications that require clear, step-by-step reasoning capabilities.
The elimination of complex prompt engineering for accessing reasoning modes makes this model particularly valuable for production applications where transparency and interpretability are key requirements.
Ready to deploy transparent, reasoning-capable AI? Get started with DeepSeek R1-0528 on Vast.ai today! 🚀
© 2025 Vast.ai. All rights reserved.