To set up a multi-node cluster on Vast, you can rent machines from the same datacenter and test their connectivity to ensure they can communicate effectively.
Alternatively, you can create a cluster with machines from different datacenters or even across the globe, though this will typically increase latency. Generally, latency ranks from lowest to highest across these setups:
1) Multiple instances on the same machine
2) Instances on different machines within the same datacenter
3) Instances on machines in different datacenters or across the world
Each instance or container in Vast is allocated one GPU, so it’s possible to run several instances/containers on a single machine if it has multiple GPUs.
For example, you could deploy five instances on a machine equipped with 5 x RTX 4090 GPUs.
Suppose scale is more important to you and you can tolerate higher latency, it would most likely be better to try to find GPU rigs within the same datacenter, in different datacenters, or across the globe and try to connect them.
For example, suppose you require something like 100 RTX 4090s, you could try to rent six 12 x RTX 4090 rigs and four 10 x RTX 4090 rigs to get the GPU power you want.
Machines that belong to a datacenter will have typically have the blue datacenter label on them. You can find the datacenter's ID next to "datacenter:".
In this case, these two machines belong to datacenter 3497. The top instance will run on machine 11877 and the bottom instance will run on machine 13280 within datacenter 3497.
Click on the blue button with ip:port range to find each instance's public ip address once the blue button says "OPEN".
Find public ip address in pop up.
Use this command to download ping utility inside a jupyter terminal or your own terminal once you have connected to the instance using ssh.
apt install iputils-ping
Ping the public ip address of another rented machine like in this example:
If you see a response like this, that means your ping was successful and you were able to connect to the machine.
If you don't see a response like this, try machines from a different datacenter. A ping can fail for reasons such as a firewall or a corporate network blocking the ping packets.
Make sure to expose the ports your servers will run on by editing your template and adding the port you want exposed before you rent your instance.
You can find the public ip:port mapped to the port you exposed in the Open Ports section in a pop up like this:
In this case, 185.62.108.226:41525 is the public ip:port mapped to port 8081 in the instance/container.
You can use curl to hit 185.62.108.226:41525 and the traffic is forwarded to port 8081 inside the instance/container.