Docs - Autoscaler

Autogroup Templates

Please note the Autoscaler is currently in Beta, and is subject to changes and downtime.

Below you can find the template_hash for different autoscaler compatiable templates that were created to allow users to run different images with the autoscaler. These templates take care of all the configuration required to run these images in API model, search for machines to run the images on, and communicate sucessfully with the autoscaling server. For each vast template, we have defined a couple of autoscaler compatable templates that allow for different models to be used. With each template_hash, we have also included the command used to create the template using the CLI. If you would like to change the environment or other parameters for each autogroup, you can create a new template by modifying relevant variables in the create template command, and then setting your autogroup to use that template by specifying the template_hash of the newly created template.

TGI #

This image is used for serving LLMs, and the relevant environment variables used by this template are described here

tgi-llama2-7B-quantized : 3f19d605a70f4896e8a717dfe6b517a2

./vast.py create template --name "tgi-llama2-7B-quantized" --image "ghcr.io/huggingface/text-generation-inference:1.0.3" --env "-p 3000:3000 -e MODEL_ARGS='--model-id TheBloke/Llama-2-7B-chat-GPTQ --quantize gptq'" --onstart-cmd 'wget -O - `https://raw.githubusercontent.com/vast-ai/vast-pyworker/main/scripts/launch_tgi.sh` | bash' --search_params "gpu_total_ram>=10 gpu_total_ram<20 inet_down>128 direct_port_count>3 disk_space>=20" --disk 8.0 --ssh --direct --explain

tgi-llama2-70B-quantized : a5751331d44ce6e9762561181430d670

./vast.py create template --name "tgi-llama2-70B-quantized" --image "ghcr.io/huggingface/text-generation-inference:1.0.3" --env "-p 3000:3000 -e MODEL_ARGS='--model-id TheBloke/Llama-2-70B-chat-GPTQ --quantize gptq'" --onstart-cmd 'wget -O - `https://raw.githubusercontent.com/vast-ai/vast-pyworker/main/scripts/launch_tgi.sh` | bash' --search_params "gpu_total_ram>=40 gpu_total_ram<50 inet_down>128 direct_port_count>3 disk_space>=80" --disk 80.0 --ssh --direct --explain

AUTOMATIC1111 SD WebUI #

The ai-dock: Stable Diffusion WebUI template (README) is used as the base image for running the vast-pyworker server for stable diffusion. Certain environment variables and parameters had to be changed to allow the image to run in api-mode. You can find more info on the environment variables by looking at the README. You can install different models by changing the PROVISIONING_SCRIPT environment variable, as described in the README.

sdauto : be921cfaff0e266d630e1200ec807d61

./vast.py create template --name "sdauto" --image ghcr.io/ai-dock/stable-diffusion-webui:latest-jupyter --env '-e DATA_DIRECTORY=/opt/ -e JUPYTER_DIR=/ -e WEBUI_BRANCH=master -e WEBUI_FLAGS="--xformers --api --api-log --nowebui" -e SERVERLESS=true -e JUPYTER_PASSWORD=password -e PROVISIONING_SCRIPT="https://raw.githubusercontent.com/ai-dock/stable-diffusion-webui/main/config/provisioning/default.sh" -p 3000:3000' --onstart-cmd 'wget -O - https://raw.githubusercontent.com/vast-ai/vast-pyworker/main/scripts/launch_sdauto.sh | bash' --disk 32 --ssh --jupyter --direct --search_params "gpu_total_ram>=24 gpu_total_ram<30 inet_down>128 direct_port_count>3 disk_space>=32.0"

Customizing Template Search Parameters #

If you are changing the model that an autogroup uses, you might need to change the machine configuration that your autogroup uses to make sure the machine specs are suited to your model. To do so, you should create a new template with a new --search_params option as well as a new --disk option if you are increasing the disk space you want your instance to have.

Here is an example (with only the relevant parameters shown)

./vast.py create template --name "sdauto-large" ... --search_params "gpu_total_ram>=50 gpu_total_ram<60 inet_down>128 direct_port_count>3 disk_space>=50.0" --disk 50.0 ...

Customizing Template Code #

All autoscaler compatible templates created by Vast use the vast-pyworker framework, which is documented here. This framework allows for communication between the worker instances and the autoscaler server. The framework can be extended to allow worker instances to run custom code, which is described here.