Vast.ai GPUs Can Now Be Rented Through SkyPilot

March 15, 2026

7 Min Read

By Team Vast

Prerequisites

A Vast.ai account with credits
Python 3.7-3.13 (on Python 3.12+, install skypilot[vast] explicitly - it is excluded from skypilot[all])
Terminal access

Installation

Install SkyPilot with Vast.ai support:

pip install -U "skypilot[vast]"

If you use uv, you can also install via:

uv pip install "skypilot[vast]"

This pulls in all required dependencies, including the Vast SDK.

Configuring Credentials

Get your API key from the Vast.ai Account page, then store it where SkyPilot expects:

mkdir -p ~/.config/vastai
echo "<YOUR_VAST_API_KEY>" > ~/.config/vastai/vast_api_key

Verify that SkyPilot detects your credentials:

sky check

You should see Vast: enabled in the output.

Deploying a Model: DeepSeek R1 on vLLM

This section walks through deploying DeepSeek R1-0528-Qwen3-8B, an 8B parameter model with step-by-step reasoning capabilities, on a Vast.ai H100 GPU using vLLM.

Create the YAML Configuration

Create a file named deepseek-r1-inference.yaml:

resources:
  accelerators: H100:1
  cloud: vast
  image_id: docker:vllm/vllm-openai:latest
  disk_size: 100

run: |
  vllm serve deepseek-ai/DeepSeek-R1-0528-Qwen3-8B \
    --host 0.0.0.0 \
    --port 8000 \
    --max-model-len 4096 \
    --gpu-memory-utilization 0.9 \
    --reasoning-parser deepseek_r1

| Parameter | Purpose | |-----------|---------| | accelerators: H100:1 | Request 1x H100 GPU | | cloud: vast | Use Vast.ai as the cloud provider | | image_id: docker:vllm/vllm-openai:latest | Pre-built vLLM Docker image | | disk_size: 100 | 100GB storage for model weights and container | | --reasoning-parser deepseek_r1 | Enable DeepSeek's chain-of-thought reasoning output | | --gpu-memory-utilization 0.9 | Use 90% of GPU VRAM for model serving |

Launch the Cluster

Deploy the service with SkyPilot:

sky launch deepseek-r1-inference.yaml --cluster deepseek-r1

SkyPilot will find the cheapest available H100 on Vast.ai, provision it, download the model (~15GB), and start the vLLM server. This takes 5-10 minutes.

Monitor progress in real time:

sky logs deepseek-r1 --follow

Confirm the cluster is ready:

sky status deepseek-r1

Look for Status: UP.

Access the Service via SSH Tunnel

Vast.ai instances provisioned through SkyPilot don't expose ports directly. Use SSH tunneling to access the vLLM server on your local machine:

ssh -L 8000:localhost:8000 deepseek-r1

Keep this terminal open - the tunnel must remain active for API access.

Test the Deployment

In a new terminal, verify the server is running:

curl http://localhost:8000/health
curl http://localhost:8000/v1/models

Send a test inference request:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
    "messages": [
      {"role": "user", "content": "What is 15 * 8? Show your reasoning step by step."}
    ],
    "max_tokens": 300,
    "temperature": 0.1
  }'

Python Client Example

The vLLM server exposes an OpenAI-compatible API, so you can use the standard OpenAI Python client:

import openai

client = openai.OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-fake-key"  # vLLM doesn't require a real API key
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
    messages=[{
        "role": "user",
        "content": "Write a brief history of Los Angeles in one paragraph."
    }],
    max_tokens=300,
    temperature=0.7
)

print(response.choices[0].message.content)

Managing Your Cluster

# Check status
sky status deepseek-r1

# View logs
sky logs deepseek-r1

# SSH into the instance
ssh deepseek-r1

# Stop the cluster (preserves data, stops billing)
sky stop deepseek-r1

# Restart a stopped cluster
sky start deepseek-r1

# Auto-stop after 30 minutes of idle time
sky autostop deepseek-r1 --idle-minutes 30

# View all clusters and costs
sky status --all

# Tear down completely
sky down deepseek-r1

Always stop or tear down clusters when you're done to avoid unnecessary charges.

Advanced: Vast Configuration in SkyPilot

SkyPilot supports Vast-specific configuration in ~/.sky/config.yaml. These options let you control instance selection and pass parameters directly to the Vast API.

Datacenter-Only Instances

Filter to professional datacenter-hosted machines only. This excludes consumer-grade or home-hosted GPUs, which can improve reliability:

vast:
  datacenter_only: true

Note that some GPU types may only be available on non-datacenter offers, so enabling this may reduce availability.

Custom Instance Parameters

The create_instance_kwargs block passes parameters directly to the Vast API when creating instances. This gives you access to Vast features beyond what SkyPilot's standard YAML exposes:

vast:
  datacenter_only: true
  create_instance_kwargs:
    python_utf8: true
    lang_utf8: true
    extra: "--shm-size=16g"
    onstart_cmd: "echo 'Instance started'"

Supported Parameters

| Parameter | Description | |-----------|-------------| | image | Docker image override (e.g., vastai/base-image:@vastai-automatic-tag) | | env | Environment variables and port mappings (e.g., "-e KEY=value -p 8080:8080") | | price / bid_price | Maximum bid price for preemptible (spot) instances | | disk | Disk size in GB (overrides disk_size in task YAML) | | label | Custom instance label | | extra | Extra docker run arguments (e.g., "--shm-size=16g") | | onstart_cmd | Shell command to run on instance start | | onstart | Path to a local script to run on instance start | | login / image_login | Docker registry credentials for private images | | python_utf8 | Enable Python UTF-8 mode | | lang_utf8 | Enable system UTF-8 locale | | jupyter_lab | Start JupyterLab on the instance | | jupyter_dir | Jupyter notebook directory path | | template_hash_id | Use a pre-configured Vast template by hash ID | | args | Custom Docker command arguments | | user | Run the container as a specific user | | vm | Use VM mode instead of container mode |

Using Vast Templates

If you have a saved Vast template, you can reference it by hash ID. When using a template, image and disk are inherited from the template:

vast:
  create_instance_kwargs:
    template_hash_id: "abc123def456"
    price: 0.50

Spot (Preemptible) Instances

Request spot instances for lower costs. SkyPilot handles preemption recovery automatically when using managed jobs:

resources:
  accelerators: H100:1
  cloud: vast
  use_spot: true

You can set a maximum bid price via create_instance_kwargs:

vast:
  create_instance_kwargs:
    bid_price: 0.50

Current Limitations

Vast.ai support in SkyPilot does not currently include:

Multi-node clusters - interconnection between nodes is non-trivial on Vast
Object store mounting - no direct mounting of S3, GCS, etc.
Port opening after launch - ports must be configured at launch time
Custom disk or network tiers - Vast does not expose these options

Troubleshooting

"Vast: not enabled" After `sky check`

Verify your API key file exists and is readable:

cat ~/.config/vastai/vast_api_key

If the file is missing, re-run the credential setup steps above. If you previously stored the key at ~/.vast_api_key, note that SkyPilot now expects it at ~/.config/vastai/vast_api_key.

SSH Tunnel Connection Refused

Confirm the cluster is running before attempting the tunnel:

sky status deepseek-r1

If the status is UP, recreate the tunnel:

ssh -L 8000:localhost:8000 deepseek-r1

Model Not Loading or Out of Memory

SSH into the instance and check GPU utilization:

ssh deepseek-r1
nvidia-smi

If the model exceeds available VRAM, reduce --gpu-memory-utilization to 0.7 or --max-model-len to a smaller value in your YAML, then redeploy.

Slow Startup

Large Docker images (vLLM is ~15GB) take time to download. Model weights add another 5-30GB depending on the model. First launch on a new instance can take 10+ minutes. Use sky logs <cluster> --follow to monitor progress.

Cleanup

Tear down the cluster when you're done to stop all charges:

sky down deepseek-r1

Next Steps

Try other models: Swap the model in the YAML to serve Llama 3, Mistral, or Qwen
Use managed jobs: Run sky jobs launch instead of sky launch for automatic recovery from preemptions
Explore spot instances: Add use_spot: true to your resource block for lower costs
Datacenter filtering: Set datacenter_only: true in ~/.sky/config.yaml for production workloads
SkyPilot docs: Full reference at docs.skypilot.co

Additional Resources