Topics

Introduction

DLPerf

Rental Types

Instances

Networking

Jupyter

Security

Billing

Data Movement

Examples

- Disco Diffusion

- Stable Diffusion

- Nvidia-GLX-Desktop

- Bittensor

Troubleshooting

Hosting

Introduction

What is Vast.ai?

Vast.ai is a cloud computing, matchmaking, and aggregation service focused on lowering the price of compute-intensive workloads. Our software allows anyone to easily become a host by renting out their hardware. Our web search interface allows users to quickly find the best deals for compute according to their specific requirements.

What is the Secure Cloud (Only Trusted Datacenters) filter?

Vast.ai partners with vetted datacenter providers all over the globe. These partners have their equipment in certified locations that are current with ISO 27001 and/or Tier 3/4 standards. Vast.ai has verified that this equipment is in these facilities and that their certifications are up to date.

For sensitive or production workloads, we recommend checking the "secure cloud" filter. Look for the blue datacenter label.

How does Vast.ai work in a nutshell?

Hosts download and run our management software, list their machines, configure prices, and set any default jobs. Clients then find suitable machines using our flexible search interface, rent their desired machines, and finally run commands or start SSH sessions with a few clicks.

What are Vast's advantages?

Vast.ai provides a simple interface to rent powerful machines at the best possible prices, reducing GPU cloud computing costs by ~3x to 5x. Consumer computers and consumer GPUs, in particular, are considerably more cost-effective than equivalent enterprise hardware. We are helping the millions of underutilized consumer GPUs around the world enter the cloud computing market for the first time.

What operating systems are provided? Windows?

Vast currently provides Linux Docker instances, mostly Ubuntu-based, no Windows.

What interface is provided?

Currently, Vast has SSH access (for SSH instances), Jupyter instances with Jupyter GUI, or a command-only instance mode. We do not provide remote desktop.

DLPerf

What is DLPerf?

DLPerf (Deep Learning Performance) - is our own scoring function. It is an approximate estimate of performance for typical deep learning tasks. Currently, DLPerf predicts performance well in terms of iters/second for a few common tasks such as training ResNet50 CNNs. For example, on these tasks, a V100 instance with a DLPerf score of 21 is roughly ~2x faster than a 1080Ti with a DLPerf of 10.

It turns out that many tasks have similar performance characteristics, but naturally, if your task is very unusual in its compute requirements, the DLPerf score may not be very predictive. A single score can never be accurate for predicting performance across a wide variety of tasks; the best we can do is approximate performance on many tasks with a weighted combination. Although far from perfect, DLPerf is more useful for predicting performance than TFLops for most tasks.

Rental Types

We currently offer two rental types: On Demand (High Priority) and Interruptible (Low Priority). On-Demand instances have a fixed price set by the host and run for as long as the client wants. Interruptible instances use a bidding system: clients set a bid price for their instance; the current highest bid is the instance that runs, the others are paused.

Are vast.ai interruptible instances the same as AWS spot or GCE interruptible?

They are similar but have a few key differences. AWS spot instances and GCE interruptible instances both can be interrupted by on-demand instances, but they do not use a direct bidding system. In addition, GCE interruptible instances can only run for 24 hours. Vast.ai interruptible instances use a direct bidding system but are otherwise not limited.

What happens when my interruptible instance loses the bid?

If another user places a higher bid or creates an on-demand rental for the same resources, then your instance will be stopped. Stopping an instance kills the running processes, so when you are using interruptible instances, it's important to save your work to disk. Also, we highly recommend having your script periodically save your outputs to cloud storage as well because once your instance is interrupted, it could be a long wait until it resumes.

Instances

How can I restart my programs once the instance restarts?

If you use the "custom command" option, then your command will run automatically when the instance starts up. However, if you are using an SSH instance, there is no default startup command. You can put startup commands in "/root/onstart.sh". This startup script will be found and run automatically on container startup.

I see my instance has a Lifetime - what does that mean?

Every instance offer on the Create page has a Max Duration. When you accept an offer and create an instance, this Max Duration becomes the instance lifetime and begins ticking down. When the lifetime expires, the instance is automatically stopped. The host can extend the contract, which will add more lifetime to your instance, or they may not - it's up to them. Assume your instance will be lost once the lifetime expires; copy out any important data before then.

How can I set environment variables?

Use the -e docker syntax in the docker create/run options. For example, to set the env variables TZC to UTC and TASKID to "TEST":

-e TZ=UTC -e TASKID="TEST"

Any environment variables you set will be visible only to your onstart script (or your entrypoint for entrypoint launch mode). When using the SSH or Jupyter launch modes, your env variables will not be visible inside your SSH/tmux/Jupyter session by default. To make custom environment variables visible to the shell, you need to export them to /etc/environment.

Add something like the following to the end of your onstart to export any env variables containing an underscore '_':

env | grep _ >> /etc/environment;

Or to export all env variables:

env >> /etc/environment;

How can I get the instance ID from within the container? The environment variable VAST_CONTAINERLABEL is defined in the container. Ex:

1
2

root@C.38250:~$ echo $VAST_CONTAINERLABEL
C.38250

How can I stop the instance from within the instance?

A special instance api key should already be installed in your container. You can just install the vastai CLI and use the stop command:

root@C.38250:~$ pip install vastai;

Then test it by starting the instance (which is a no-op as the instance is already running):

vastai start instance $CONTAINER_ID;

If that works then you can stop the instance as well:

vastai stop instance $CONTAINER_ID;

If $CONTAINER_ID is not defined check your environment variables using the 'env' command. If you are missing the predefined env variables from an ssh session you may need to add a command to export them to /etc/environment (see above).

If you don't have the instance api key for whatever reason, you can also generate it. First run the following from inside the instance to create a special per instance api key and save it in the appropriate location:

cat ~/.ssh/authorized_keys | md5sum | awk '{print $1}' > ssh_key_hv; echo -n $VAST_CONTAINERLABEL | md5sum | awk '{print $1}' > instance_id_hv; head -c -1 -q ssh_key_hv instance_id_hv > ~/.vast_api_key;

Then install the CLI with pip or download latest from github:

apt-get install -y wget; wget https://raw.githubusercontent.com/vast-ai/vast-python/master/vast.py -O vast; chmod +x vast;

How can I launch another Docker container from within the instance?

Vast currently does not currently support Docker within Docker due to security constraints. You will need to launch each docker container on a separate instance.

Networking

How can I open custom ports?

Add -p arguments in the docker create/run options box in the template configuration or image config editor pop-up menu. To open ports 8081 and 8082, use something like this:

-p 8081:8081 -p 8082:8082

This will result in additional arguments to docker create/run to expose those internal ports, which will be mapped to random external ports. Any ports exposed in these docker options are in addition to ports exposed through EXPOSE commands in the docker image, and the ports 22 or 8080 which may be opened automatically for SSH or Jupyter.

After the instance has loaded, you can find the corresponding external public IP:port by opening the IP Port Info pop-up (button on top of the instance) and then looking for something like:

65.130.162.74:33526 -> 8081/tcp

In this case, the public IP:port 65.130.162.74:33526 can be used to access anything you run on port 8081 inside the instance. As a simple test case, you can run a simple minimal web browser inside the instance with the following command:

python -m http.server 8081

Which you would then access in this example by loading 65.130.162.74:33526 in your web browser.

How can I open an identity port map like 32001:32001 where external:internal are the same?

Just use an out-of-range port above 70000:

-p 70000:70000 -p 70001:70001

Jupyter

I'm getting very slow transfer speeds using the jupyter download/upload?

You probably created a proxy instance, which is the default. The proxy instance is still useful for some use cases as you can get full speed downloading files with wget, git, external ftp, cloud storage, etc. However the built-in upload/download buttons can be very slow (especially when the proxy servers are overloaded). So if you want full speed transfers using the jupyter upload/download GUI, you need to create a direct https instance. On the create page, select EDIT IMAGE & CONFIG, open your image template (usually pytorch), then select the direct https mode under jupyter. Then create a new instance. You will need to import a certificate (see below).

What is this HTTPS website unsecure warning?

The jupyter direct HTTPS option is faster than the proxy option, but it requires installing a new certificate in your browser: Download the certificate file, then go into your browser certificate settings and add the new certificate. In Google Chrome you click the little 3-dot menu in the top right corner, then settings in the drop-down menu, then Privacy and security on the left, then Security in the middle, then scroll down to the Advanced section and click on Manage certificates, then the Authorities tab, and finally the Import button and select the certificate file you just downloaded (jvastai_root.cer).

I'm deleting files in Juypter but it's not freeing disk space! How do I truly delete?

By design/default the delete button in Jupyter does not actually delete files, it just moves them to the Trash folder, which is located at:

~/.local/share/Trash

So you can delete the trash folder in a terminal using rm -r:

rm -r ~/.local/share/Trash

How do I run colab notebooks?

Just select the recommended pytorch image which has the jupyter launch mode pre-enabled. Select a GPU and start the instance. Once it loads, click on the jupyter button on the bottom right of the instance card to start jupyter. Then download the colab notebook as a .ipynb file and upload it to the instance in Jupyter. Then just click on that to run the notebook. Depending on the notebook you may need to install additional dependencies with apt-get or pip. As of now we don't have a recommended colab-emulating docker image.

I'm getting some missing library or package error?

Depending on the notebook, you may have to install additional dependencies. You can do this by opening a terminal in jupyter and then using regular apt-get install PACKAGE or pip install PACKAGE.

How can I more easily download many files?

Jupyter Labs supports downloading multiple files by shift click to select multiple and then right click download option. But Jupyter Notebook only supports downloading individual files, and neither supports downloading folders/directories. You can use zip to more quickly download directories and large numbers of files. First open a terminal, and then in the terminal you can install zip and use that to zip up many files into a single package:

apt-get install -y zip

And then to zip all of the files in the "images_out/TimeToDisco" (only) directory:

zip all_images.zip images_out/TimeToDisco/*

Or zip all of the files in "images_out" including sub-directories:

zip -r all_images.zip images_out/

Jupyter is ok, but can I run colab directly with a vast instance?

Yes. Please follow our Colab guide.

SSH

How do I connect to an SSH instance on linux/mac?

On Ubuntu or Mac, first you need to generate an rsa ssh public/private keypair using the command:

ssh-keygen -t rsa

Next you may need to force the daemon to load the new private key, and confirm it's loaded:

ssh-add; ssh-add -l

Then get the contents of the public key with:

cat ~/.ssh/id_rsa.pub

Copy the entire output to your clipboard, then paste that into the "Change SSH Key" text box under console/account. The key text includes the opening "ssh-rsa" part and the ending "user@something" part. If you don't copy the entire thing, it won't work.

example SSH key text:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAZBAQDdxWwxwN5Lz7ubkMrxM57CHhVzOnZuLt5FHi7J8zFXCJHfr96w+ccBOBo2rtBBTTRDLnJjIsKLgBcC3+jGyZhpUNMFRVIJ7MeqdEHgHFvUUV/uBkb7RjbyyFcb4BCSYNggUZkMNNoNgEa3aqtBSzt47bnuGqKszs9bfACaPFtr9Wo0b8p4IYil/gfOY5kuSVwkqrBCWrg53/+T2rAk/02mWNHXyBktJAu1q9qTWcyO68JTDd0sa+4apSu+CsJMBJs3FcDDRAl3bcpiKwRbCkQ+N63ol4xDV3zQRebUc98CJPh04Gnc41W02lmdqFL2XG5U/rV8/JM7CawKiIz3dbkv bob@velocity

You can use a few SSH keys by pasting in each on a new line.

How do I connect to an SSH instance from windows?

You can use windows subsystem for linux and then follow the ssh instructions for linux/mac. But some windows users prefer a GUI tool, so here is a quick guide to connecting to an ssh instance from windows using putty. Make sure you save the key in ssh rsa-2 format.

What is this tmux thing? How do I create multiple bash terminals on my ssh instance?

We connect you to a tmux session by default for reliability. You can create a new bash terminal window with "ctrl+b,c" (press ctrl and b, followed by c), and switch with "ctrl+b,n". But google "tmux cheat sheet" or "tmux guide" for more info.

Ok - but how can I disable tmux?

We don't recommend this generally as ssh (especially the proxy ssh connection) can be unstable, but if you know what you are doing and need raw ssh you can add the following to your onstart:

touch ~/.no_auto_tmux;

Security

How do you protect my data from other clients?

Clients are isolated to unprivileged docker containers and only have access to their own data.

How do you protect my data from providers?

There are many providers on Vast.ai, ranging from tier 4 datacenters with extensive physical and operational security down to individual hobbyists renting out a few machines in their home. Our vetted datacenter partners can provide data security similar to other large cloud providers. If data security is important for your use case, you may want to rent only from our datacenter partners.

Even though our smaller community providers generally do not have datacenter-level physical or operational security, they have little to gain and much to lose from stealing customer data. It can take months for providers to accumulate trust and verified status on Vast. These verified providers are thus incentivized to maintain their reputational standing just like larger cloud providers. Hosts generally have many different clients, and there are significant costs to identifying, saving, copying, and exploiting any interesting datasets, let alone any particular client's data. You can also roughly see the relative age of a provider by their ID.

Billing

How does billing work?

Once you enter a credit card and an email address and both are verified, you will receive a small amount of free test credit. Then you can increase your credit balance using one-time payments with the add credit button. Whenever your credit balance hits zero or below, your instances will be stopped automatically, but not destroyed.

You are still charged storage costs for stopped instances, so it is important to destroy instances when you are done using them.

Your credit card will be automatically charged periodically to pay off any outstanding negative balance.

Can you bill my card automatically so I don't have to add credit in advance?

You can set a balance threshold to configure auto billing, which will attempt to maintain your balance above the threshold. We recommend setting a threshold around your daily or weekly spend, and then setting an balance email notification threshold around 75% of that value, so that you get notified if the auto billing fails but long before your balance depletes to zero.

There is also an optional debit-mode feature which can be enabled by request for older accounts. When debit-mode is enabled, your account balance is allowed to go negative (without immediately stopping your instances).

I didn't enable debit-mode - what are these automatic charges to my card?

Your card is charged automatically regardless of whether or not you have debit-mode enabled. Instances are never free - even stopped instances have storage charges. Make sure you delete instances when you are done with them - otherwise, your card will continue to be periodically charged indefinitely.

How does pricing work?

There are separate prices and charges for:

Active rental (GPU) costs
Storage costs
Bandwidth costs

You are charged the base active rental cost for every second your instance is in the active/connected state. You are charged the storage cost (which depends on the size of your storage allocation) for every second your instance exists and is online, regardless of what state it is in: active, inactive, loading, etc. Stopping an instance does not avoid storage costs. You are charged bandwidth prices for every byte sent or received to or from the instance, regardless of what state it is in. The prices for base rental, storage, and bandwidth vary considerably from machine to machine, so make sure to check them. You are not charged active rental or storage costs for instances that are currently offline.

What is the billing frequency?

Balances are updated about once every few seconds.

Why should I trust vast.ai with my credit card info?

You don't need to: Vast.ai does not see, store or process your credit card numbers, they are passed directly to Stripe (which you can verify in the javascript).

Do you support PayPal? What about cryptocurrency?

We currently support major credit cards through stripe and crypto payments through Coinbase and crypto.com.

Data Movement

How do I upload/download to/from my instance?

You can use the CLI copy command to copy from/to directories on a remote instance and your local machine, or to copy data between two remote instances. You can use the copy buttons in the GUI to copy data between two remote instances. The copy command uses rsync and is generally fast and efficient, subject to single link upload/download constraints. Example:

./vast copy ~/workspace 4330147:/workspace

Currently, one endpoint of the copy must involve a vast instance with open ports. For a remote->local copy or local->remote copy, the remote instance must be on a machine with open ports (although the instance itself does not need open ports), and the remote instance can be stopped/inactive. For instances on machines WITHOUT open ports, copy to/from local is not available, but you can still copy to a 2nd vast instance with open ports.

For a remote->remote copy (copy between 2 instances), the src can be stopped and does not need open ports, but the dst must be a running instance with open ports. It is not sufficient that the instance is on a machine with open ports, the instance itself must have been created with open port mappings. If the instance is created with the direct connect option (for jupyter or ssh launch modes), the instance will have at least one open port. Otherwise, for proxy or entrypoint instance types, you can get open ports using the -p option to reserve a port in the instance configuration under run options (and you must also then pick a machine with open ports).

If your data is already stored in the cloud (S3, gdrive, etc) then you should naturally use the appropriate linux CLI or commands to download and upload data directly. This generally will be one the fastest methods for moving large quantities of data, as it can fully saturate a large number of download links. If you are using multiple instances with significant data movement requirements you will want to use high bandwidth cloud storage and avoid any single machine bottlenecks.

If you launched a Jupyter notebook instance, you can use its upload feature, but this has a file size limit and can be slow.

You can also use standard Linux tools like scp, ftp, rclone, or rsync to move data. For moving code and smaller files scp is fast enough and convenient. However, be warned that the default ssh connection uses a proxy and can be slow for large transfers.

How do I upload/download to/from my instance - using scp?

If you launched an ssh instance, you can copy files using scp. The default ssh connection uses a proxy and thus can be slow (in terms of latency and bandwidth). Thus we recommend only using scp over the default ssh connection for smaller transfers (less than 1 GB). For larger inbound transfers, a direct connection is recommended. Downloading from a cloud data store using wget or curl can have much higher performance.

The relevant scp command syntax is:

scp -P PORT LOCAL_FILE root@IPADDR:/REMOTEDIR

The PORT and IPADDR fields must match those from the ssh command. The "Connect" button on the instance will give you these fields in the form:

ssh -p PORT root@IPADDR -L 8080:localhost:8080

For example, if Connect gives you this:

ssh -p 7417 root@52.204.230.7 -L 8080:localhost:8080

You could use scp to upload a local file called "myfile.tar.gz" to a remote folder called "mydir" like so:

scp -P 7417 myfile.tar.gz root

I'm getting a ConnectionResetError downloading files?

This seems to be due to bugs in the urllib3 and or requests libraries used by many Python packages. We recommend using wget to download large files - it is quite robust and recovers from errors gracefully.

How can I download a Kaggle dataset using wget?

First, you need to get the raw https link. Using the Chrome browser, on the Kaggle website go to the relevant dataset or competition page and start downloading the file you want. Then cancel the download and press Ctrl+J to bring up the Chrome Downloads page. At the top is the most recent download with a name and a link under it. Right-click on the URL link and use "copy link address". Then you can use wget with that URL as follows:

wget 'URL' --no-check-certificate -O FILENAME

Notice the URL needs to be wrapped in ' ' single quotes.

Examples

Disco Diffusion

Disco diffusion is powerful free and open source AI image generator, which is easy to use on vast.ai. With the right settings and powerful GPUs, it can generate artist quality high-res images for a wide variety of subjects.

There are numerous options for running Disco Diffusion on vast, but two good options are 1.) using the pytorch docker image and a slightly modified notebook, or 2.) using a custom docker image (fork) made specifically to run DD in docker, such as jinaai/discoart. The latter custom docker image can spin up somewhat faster and has a number of advanced features beyond the original notebook, but currently requries cuda >= 11.6, which limits machine options.

The disco diffusion notebooks were created for colab, but they will run in docker on vast using the common pytorch image with some slight modifications (to install a few required libs). You can use one of our slightly modified DD notebooks (5.6, 5.4, 5.2) to get started quickly.

Here is a quick video tutorial.

You'll want to create a jupyter instance with the pytorch image, and you'll probably want about 30 GB of disk to store the various models and to save all the beautiful high-res images you will be generating. (This is important! Make sure to choose a storage allocation before creating the instance - as you currently can not resize the instance disk allocation and running out of space can be catastrophic.)

Download the notebook (5.6, 5.4, 5.2) as a .ipynb file, then upload it to your jupyter instance created from the pytorch/pytorch image. Click on the notebook to run it.

Then just run fast forward the notebook or step through each cell. It can take 15 minutes or so to download the models, depending on instance internet speed. The cell (4. Diffuse) near the end shows the in-progress output image. By default it only updates every 50 steps, but you can change this via the display_rate variable in that cell - setting it to 1 shows the result of each iteration.

Instead of the pytorch image, you can use the custom jinaai/discoart docker image. For this image we recommend adding the environment variable option -e JUPYTER_DIR=discoart (or JUPYTER_DIR=/) to your docker run options (directly under the image tag). This will instruct jupyter to start in a more sensible directory rather than /app which is basically empty. If you set this env variable you'll see the discoart.ipynb notebook file is already there in the /discoart folder, no need to upload.

If you are using Jupyter Labs you can select multiple files (using the shift key), and then download all of them one after another. Jupyter Notebook only supports downloading individual files, and neither Labs nor Notebook support downloading folders. To more conveniently download folders or a number of files, you can use the command line zip tool. First open a new terminal, and then in the terminal run the following:

apt-get install -y zip

And then to zip all of the files in the default images_out/TimeToDisco directory:

zip all_images.zip images_out/TimeToDisco/*

Note: do no use spaces in your folder names, they cause headaches on linux! Use the '\_' underscore instead.

For discussion/help/advice running DD on vast find us on our discord, and make sure to check out the main DD discord.

Stable Diffusion

Stable Diffusion is a newer image diffusion generator which is generally much faster that disco diffusion, and requires less RAM. It is easy to use on Vast.ai with the Automatic111 web UI.

Simply select the Stable Diffusion recommended template. To do so, navigate to the create page, click the edit image & config button and then the recommended tab. From there the stable diffusion template is visible. Once selected, it will then load the correct image and port settings for the web UI to load.

Select a 1X GPU instance. Once the instance starts up, simply click the Open button to open the web UI interface.

Nvidia-GLX-Desktop

Nvidia-GLX-Desktop is a docker image which provides a virtual desktop with GPU acceleration. On some (but not all) machines it requires specifying an external webRTC TURN server. To run this image on vast, just use the recommended docker image. Navigate to the create page, click the edit image & config button and then the recommended tab. Select the GLX template.

The default username is user and the default password is mypasswd. You can change these with -e env variables (see the linked page for details).

NVIDIA GLX requires a fast internet connection on both the host and local machines. Make sure to use a Inet Up and Inet Down filter set appropriately to ~300Mbps. Open the README file on the GLX template for more information.

Bittensor

Bittensor is a decentralized, blockchain-based machine learning network. The latest recommended template uses Bittensor version 3.70 (Finney) and installs Cubit on start. Vast has a wide variety of affordable GPUs which are ideal for running the Bittensor GPU miner.

The setup is mostly straightforward. Bittensor expects an open port which is identity mapped (external and internal are the same), which is possible on vast using out of range 'virtual' ports. That is handled in the recommended template using -p to setup the virutal ports.

Navigate to the create page, click the edit image & config button and then the recommended tab. Pick the Bittensor recommended docker image. That will configure the latest version of Bittensor along with the correct port settings.

Open up the README file associated with the template. Follow the instructions to create a cold and hot wallet on your local machine and to then run Bittensor. The first step is to register on the Bittensor network.

For more info see the bittensor installation documentation.

Troubleshooting

All my instances keep stopping, switching to inactive status, even though I didn't press the stop button. What's going on?

Check your credit balance. If it hits zero or below, your instances will be stopped automatically.

I keep getting this error: spend_rate_limit. What's going on?

There is a spend rate limit for new accounts. The limit is extremely small for unverified accounts, so make sure to verify your email. The limit increases over time, so try a cheaper instance type or wait a few hours. If you are still having trouble, use the online support chat in the lower right.

I tried to connect with ssh and it asked for a password. What is the password?

There is no ssh password, we use ssh key authentication. If ssh asks for a password, typically this means there is something wrong with the ssh key that you entered or your ssh client is misconfigured.

On Ubuntu or Mac, first you need to generate an rsa ssh public/private keypair using the command:

ssh-keygen -t rsa

Next you may need to force the daemon to load the new private key, and confirm it's loaded:

ssh-add; ssh-add -l

Then get the contents of the public key with:

cat ~/.ssh/id_rsa.pub

Copy the entire output to your clipboard, then paste that into the "Change SSH Key" text box under console/account. The key text includes the opening "ssh-rsa" part and the ending "user@something" part. If you don't copy the entire thing, it won't work.

Example SSH key text:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDdxWwxwN5Lz7ubkMrxM5FCHhVzOnZuLt5FHi7J8zFXCJHfr96w+ccBOBo2rtBBTTRDLnJjIsKLgBcC3+jGyZhpUNMFRVIJ7MeqdEHgHFvUUV/uBkb7RjbyyFcb4BCSYNggUZkMNNoNgEa3aqtBSzt47bnuGqqszs9bfDCaPFtr9Wo0b8p4IYil/gfOYBkuSVwkqrBCWrg53/+T2rAk/02mWNHXyBktJAu1q7qTWcyO68JTDd0sa+4apSu+CsJMBJs3FcDDRAl3bcpiKwRbCkQ+N6sol4xDV3zQRebUc98CJPh04Gnc41W02lmdqGL2XG5U/rV8/JM7CawKiIz3dbkv bob@velocity

I stopped my instance, and now when I try to restart it the status is stuck on "scheduling". What is wrong?

When you stop an instance, the gpu(s) it was using may get reassigned. When you later then try to restart the instance, it tries to get those gpu(s) back - that is the "scheduling" phase. If another high priority job is currently using any of the same gpu(s), your instance will be stuck in "scheduling" phase until the conflicting jobs are done. We know this is not ideal, and we are working on ways to migrate containers across gpus and machines, but until then we recommend not stopping an instance unless you are ok with the risk of waiting a while to restart it."

Hosting

How much money can a host make?

It's complicated; it depends on many factors (hardware performance, price, reliablity, etc).

You can estimate your hardware's earning potential by comparing to similar hardware already rented on Vast.ai. Pricing statistics over time are tracked on 500farm. Or got to the create console page select "Include Unavailable Offers" and the nvidia/opencl image to see most instance types including those fully rented.

Hosts can run low priority jobs on their own machines, so there is always a fallback when high priority jobs are not available.

How and when will I be paid for hosting?

For users in the US and internationally, we support payout to a bank account (ACH) via Stripe Connect. Check to see if your country is supported here. Hosts can also receive payout through Paypal or Wise. Due to various transaction fees, there is a minimum payout of $20 (or equivalent in other currencies). Host billing runs on a regular weekly schedule every Friday. Host invoices are then paid out the following week, depending on bank transfer times.

Do you support payout in any crypto-currencies?

No, not at this time.

What is the revenue/fee structure?

Hosts receive 75% of the revenue earned from successful jobs, with 25% kept by Vast.ai.

What happens if I turn off my machine or just lose internet during a compute job?

Hosts are expected to provide reliable machines. We track data on disconnects, outages, and other errors; this data is then be used to estimate a host machine's future reliability. These reliability score estimates are displayed on the listing cards and also used as a factor in the default 'auto' ranking criteria.

What security measures protect my machine and my network?

Guests are contained to an isolated operating system image using Linux containers. Containers provide the right combination of performance, security, and reliability for our use case. The guest only has access to devices and resources explicitly granted to them by their contract.

Will guests be able to determine my IP address?

We do not by default prevent a guest from finding your router or NAT's external facing IP address by visiting some third party website, as this would require a full proxy network and all the associated bandwidth charges. It is essential that guests be able to download large datasets affordably. For many users a properly configured NAT/firewall should already provide protection enough against any consequences of a revealed IP address. For those who want additional peace of mind, we suggest configuring a separate network for your hosted machines. (But do make sure they can reach each other locally!)

How do I set prices?

There are two prices to consider: the max price and the min price. The max price is what on demand rentals pay, and as a host you can set that price on the Host/Machines page with the Set Prices button. As a host you can set a min bid price for your machine by creating an idle job at that price on the Host/Create Job page. If you don't want to setup a true mining idle job, you can just use "ubuntu" as the image and "bash" as the command. See the Host Setup page for more info on idle jobs.

Someone is renting my machine for less than my price, what's happening?

They are using a bid. The price that hosts set on the Host/Machines page is not the rental price. It is the maximum rental price. On demand instances pay the max price, but interruptible instances use a bidding system. You can control the min bid price by setting up an idle job. Alternatively, you can use the CLI to set a per machine min bid (reserve) price.

How do I remove a GPU from my machine?

Removing gpus is currently not supported. If you really need to remove a gpu, you will need to unlist the machine and wait for 0 rentals. Then, when it is safe, you can recreate the machine. You can do this by deleting the file: /var/lib/vastai_kaalia/machine_id

What will the stability of earnings be like?

The demand for DL compute has grown stably and significantly in the last few years; this growth is expected to continue for the forseeable future by most market analysts, and Nvidia's stock has skyrocketed accordingly. Demand for general GPU compute is less volatile than demand for cryptocurrency hashing. The stability of any particular host's earnings naturally depends on their hardware relative to the rest of the evolving market.

The slowdown in Moore's Law implies that hardware will last longer in the future. Amazon is still running Tesla K80's profitably now almost 4 years after their release, and the Kepler architecture they use is now about 6 years old.

What operating systems are supported for hosting?

Initially we are supporting Ubuntu Linux, more specifically Ubuntu 16.04 LTS. We expect that deep learning is the most important initial use case and currently the deep learning software ecosystem runs on Ubuntu. If you are a Windows or Mac user, don't worry, Ubuntu is easy and quick to install. If you are a current Windows user, it is also simple to setup Ubuntu in a dual-boot mode. Our software automatically helps you install the required dependencies on top of Ubuntu 16.04.

Hardware

What are the hardware requirements for hosting?

Technically if our software detects recent/decent Nvidia GPUs (GTX 10XX series) we will probably allow you to join, but naturally that doesn't guarantee any revenue. What truly matters is your hardware's actual performance on real customer workloads, which can be estimated from benchmarks.

Deep Learning is GPU-intensive but also requires some IO and CPU performance per GPU to feed them with data. Multi-GPU systems are preferable for faster training through parallelization but also require more total system performance in proportion, and parallel training can require more pcie bandwidth per GPU in particular. Rendering and many other workloads have similiar requirements.

What kind of hardware works best for Deep Learning?

It depends heavily on the model and libraries used; it's constantly evolving; it's complicated. We suggest looking into the various deep learning workstations offered today for some examples, and see this in-depth discussion on hackernews . GPU workstations built for deep learning are similar to those built for rendering or other compute intensive tasks.

Solutions
Hosting
Console

Contact
Get in Touch

All the answers you need in 24h or less.