Vast.ai GPUs Can Now Be Rented Through SkyPilot

SkyPilot is an open-source framework from UC Berkeley that abstracts away cloud infrastructure, letting you define AI workloads once and run them on any provider. It handles resource discovery, provisioning, cost optimization, and failover across 25+ clouds, including Vast.ai.
With Vast.ai as a SkyPilot backend, you get access to Vast's GPU marketplace, often the lowest-cost option for on-demand GPU compute, through SkyPilot's unified interface. SkyPilot automatically finds the cheapest available instance matching your requirements, provisions it, and manages the full lifecycle.
This guide covers setup, configuration, deploying a model, and the advanced Vast-specific options available in SkyPilot.
Prerequisites
- A Vast.ai account with credits
- Python 3.7-3.13 (on Python 3.12+, install
skypilot[vast]explicitly - it is excluded fromskypilot[all]) - Terminal access
Installation
Install SkyPilot with Vast.ai support:
pip install -U "skypilot[vast]"
If you use uv, you can also install via:
uv pip install "skypilot[vast]"
This pulls in all required dependencies, including the Vast SDK.
Configuring Credentials
Get your API key from the Vast.ai Account page, then store it where SkyPilot expects:
mkdir -p ~/.config/vastai
echo "<YOUR_VAST_API_KEY>" > ~/.config/vastai/vast_api_key
Verify that SkyPilot detects your credentials:
sky check
You should see Vast: enabled in the output.
Deploying a Model: DeepSeek R1 on vLLM
This section walks through deploying DeepSeek R1-0528-Qwen3-8B, an 8B parameter model with step-by-step reasoning capabilities, on a Vast.ai H100 GPU using vLLM.
Create the YAML Configuration
Create a file named deepseek-r1-inference.yaml:
resources:
accelerators: H100:1
cloud: vast
image_id: docker:vllm/vllm-openai:latest
disk_size: 100
run: |
vllm serve deepseek-ai/DeepSeek-R1-0528-Qwen3-8B \
--host 0.0.0.0 \
--port 8000 \
--max-model-len 4096 \
--gpu-memory-utilization 0.9 \
--reasoning-parser deepseek_r1
| Parameter | Purpose |
|-----------|---------|
| accelerators: H100:1 | Request 1x H100 GPU |
| cloud: vast | Use Vast.ai as the cloud provider |
| image_id: docker:vllm/vllm-openai:latest | Pre-built vLLM Docker image |
| disk_size: 100 | 100GB storage for model weights and container |
| --reasoning-parser deepseek_r1 | Enable DeepSeek's chain-of-thought reasoning output |
| --gpu-memory-utilization 0.9 | Use 90% of GPU VRAM for model serving |
Launch the Cluster
Deploy the service with SkyPilot:
sky launch deepseek-r1-inference.yaml --cluster deepseek-r1
SkyPilot will find the cheapest available H100 on Vast.ai, provision it, download the model (~15GB), and start the vLLM server. This takes 5-10 minutes.
Monitor progress in real time:
sky logs deepseek-r1 --follow
Confirm the cluster is ready:
sky status deepseek-r1
Look for Status: UP.
Access the Service via SSH Tunnel
Vast.ai instances provisioned through SkyPilot don't expose ports directly. Use SSH tunneling to access the vLLM server on your local machine:
ssh -L 8000:localhost:8000 deepseek-r1
Keep this terminal open - the tunnel must remain active for API access.
Test the Deployment
In a new terminal, verify the server is running:
curl http://localhost:8000/health
curl http://localhost:8000/v1/models
Send a test inference request:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
"messages": [
{"role": "user", "content": "What is 15 * 8? Show your reasoning step by step."}
],
"max_tokens": 300,
"temperature": 0.1
}'
Python Client Example
The vLLM server exposes an OpenAI-compatible API, so you can use the standard OpenAI Python client:
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-fake-key" # vLLM doesn't require a real API key
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
messages=[{
"role": "user",
"content": "Write a brief history of Los Angeles in one paragraph."
}],
max_tokens=300,
temperature=0.7
)
print(response.choices[0].message.content)
Managing Your Cluster
# Check status
sky status deepseek-r1
# View logs
sky logs deepseek-r1
# SSH into the instance
ssh deepseek-r1
# Stop the cluster (preserves data, stops billing)
sky stop deepseek-r1
# Restart a stopped cluster
sky start deepseek-r1
# Auto-stop after 30 minutes of idle time
sky autostop deepseek-r1 --idle-minutes 30
# View all clusters and costs
sky status --all
# Tear down completely
sky down deepseek-r1
Always stop or tear down clusters when you're done to avoid unnecessary charges.
Advanced: Vast Configuration in SkyPilot
SkyPilot supports Vast-specific configuration in ~/.sky/config.yaml. These options let you control instance selection and pass parameters directly to the Vast API.
Datacenter-Only Instances
Filter to professional datacenter-hosted machines only. This excludes consumer-grade or home-hosted GPUs, which can improve reliability:
vast:
datacenter_only: true
Note that some GPU types may only be available on non-datacenter offers, so enabling this may reduce availability.
Custom Instance Parameters
The create_instance_kwargs block passes parameters directly to the Vast API when creating instances. This gives you access to Vast features beyond what SkyPilot's standard YAML exposes:
vast:
datacenter_only: true
create_instance_kwargs:
python_utf8: true
lang_utf8: true
extra: "--shm-size=16g"
onstart_cmd: "echo 'Instance started'"
Supported Parameters
| Parameter | Description |
|-----------|-------------|
| image | Docker image override (e.g., vastai/base-image:@vastai-automatic-tag) |
| env | Environment variables and port mappings (e.g., "-e KEY=value -p 8080:8080") |
| price / bid_price | Maximum bid price for preemptible (spot) instances |
| disk | Disk size in GB (overrides disk_size in task YAML) |
| label | Custom instance label |
| extra | Extra docker run arguments (e.g., "--shm-size=16g") |
| onstart_cmd | Shell command to run on instance start |
| onstart | Path to a local script to run on instance start |
| login / image_login | Docker registry credentials for private images |
| python_utf8 | Enable Python UTF-8 mode |
| lang_utf8 | Enable system UTF-8 locale |
| jupyter_lab | Start JupyterLab on the instance |
| jupyter_dir | Jupyter notebook directory path |
| template_hash_id | Use a pre-configured Vast template by hash ID |
| args | Custom Docker command arguments |
| user | Run the container as a specific user |
| vm | Use VM mode instead of container mode |
Using Vast Templates
If you have a saved Vast template, you can reference it by hash ID. When using a template, image and disk are inherited from the template:
vast:
create_instance_kwargs:
template_hash_id: "abc123def456"
price: 0.50
Spot (Preemptible) Instances
Request spot instances for lower costs. SkyPilot handles preemption recovery automatically when using managed jobs:
resources:
accelerators: H100:1
cloud: vast
use_spot: true
You can set a maximum bid price via create_instance_kwargs:
vast:
create_instance_kwargs:
bid_price: 0.50
Current Limitations
Vast.ai support in SkyPilot does not currently include:
- Multi-node clusters - interconnection between nodes is non-trivial on Vast
- Object store mounting - no direct mounting of S3, GCS, etc.
- Port opening after launch - ports must be configured at launch time
- Custom disk or network tiers - Vast does not expose these options
Troubleshooting
"Vast: not enabled" After sky check
Verify your API key file exists and is readable:
cat ~/.config/vastai/vast_api_key
If the file is missing, re-run the credential setup steps above. If you previously stored the key at ~/.vast_api_key, note that SkyPilot now expects it at ~/.config/vastai/vast_api_key.
SSH Tunnel Connection Refused
Confirm the cluster is running before attempting the tunnel:
sky status deepseek-r1
If the status is UP, recreate the tunnel:
ssh -L 8000:localhost:8000 deepseek-r1
Model Not Loading or Out of Memory
SSH into the instance and check GPU utilization:
ssh deepseek-r1
nvidia-smi
If the model exceeds available VRAM, reduce --gpu-memory-utilization to 0.7 or --max-model-len to a smaller value in your YAML, then redeploy.
Slow Startup
Large Docker images (vLLM is ~15GB) take time to download. Model weights add another 5-30GB depending on the model. First launch on a new instance can take 10+ minutes. Use sky logs <cluster> --follow to monitor progress.
Cleanup
Tear down the cluster when you're done to stop all charges:
sky down deepseek-r1
Next Steps
- Try other models: Swap the model in the YAML to serve Llama 3, Mistral, or Qwen
- Use managed jobs: Run
sky jobs launchinstead ofsky launchfor automatic recovery from preemptions - Explore spot instances: Add
use_spot: trueto your resource block for lower costs - Datacenter filtering: Set
datacenter_only: truein~/.sky/config.yamlfor production workloads - SkyPilot docs: Full reference at docs.skypilot.co


