Docs - Autoscaler

Autogroup Templates

Warning
The Autoscaler is currently in Beta and may experience changes, quirks, and downtime.

Below are template_hash values for various autoscaler-compatible templates. These templates handle all the necessary configuration to run different images in API mode, search for suitable machines, and communicate successfully with the autoscaling server.

Customizing Your Own Templates: If you would like to change the environment or other parameters for each autogroup, you can create a new template by modifying relevant variables in the create template command. Then, set your autogroup to use that template by specifying the new template_hash of your newly created template.

TGI Templates #

The text-generation-inference image is used for serving LLMs. Relevant environment variables are described here.

tgi-llama2-7B-quantized #

template_hash: 3f19d605a70f4896e8a717dfe6b517a2

vastai create template --name "tgi-llama2-7B-quantized" --image "ghcr.io/huggingface/text-generation-inference:1.0.3" --env "-p 3000:3000 -e MODEL_ARGS='--model-id TheBloke/Llama-2-7B-chat-GPTQ --quantize gptq'" --onstart-cmd 'wget -O - `https://raw.githubusercontent.com/vast-ai/vast-pyworker/main/scripts/launch_tgi.sh` | bash' --search_params "gpu_total_ram>=10 gpu_total_ram<20 inet_down>128 direct_port_count>3 disk_space>=20" --disk 8.0 --ssh --direct --explain

tgi-llama2-70B-quantized #

template_hash: a5751331d44ce6e9762561181430d670

vastai create template --name "tgi-llama2-70B-quantized" --image "ghcr.io/huggingface/text-generation-inference:1.0.3" --env "-p 3000:3000 -e MODEL_ARGS='--model-id TheBloke/Llama-2-70B-chat-GPTQ --quantize gptq'" --onstart-cmd 'wget -O - `https://raw.githubusercontent.com/vast-ai/vast-pyworker/main/scripts/launch_tgi.sh` | bash' --search_params "gpu_total_ram>=40 gpu_total_ram<50 inet_down>128 direct_port_count>3 disk_space>=80" --disk 80.0 --ssh --direct --explain

AUTOMATIC1111 SD WebUI #

The ai-dock: Stable Diffusion WebUI template (README) is used as the base image for running the vast-pyworker server for stable diffusion. Environment variables and parameters are adjusted to run in API mode. You can install different models by changing the PROVISIONING_SCRIPT environment variable.

sdauto #

template_hash: be921cfaff0e266d630e1200ec807d61

vastai create template --name "sdauto" --image ghcr.io/ai-dock/stable-diffusion-webui:latest-jupyter --env '-e DATA_DIRECTORY=/opt/ -e JUPYTER_DIR=/ -e WEBUI_BRANCH=master -e WEBUI_FLAGS="--xformers --api --api-log --nowebui" -e SERVERLESS=true -e JUPYTER_PASSWORD=password -e PROVISIONING_SCRIPT="https://raw.githubusercontent.com/ai-dock/stable-diffusion-webui/main/config/provisioning/default.sh" -p 3000:3000' --onstart-cmd 'wget -O - https://raw.githubusercontent.com/vast-ai/vast-pyworker/main/scripts/launch_sdauto.sh | bash' --disk 32 --ssh --jupyter --direct --search_params "gpu_total_ram>=24 gpu_total_ram<30 inet_down>128 direct_port_count>3 disk_space>=32.0"

Customizing Template Search Parameters #

To adjust machine configuration for different models, create a new template with updated --search_params and --disk options.

Here is an example (with only the relevant parameters shown)

vastai create template --name "sdauto-large" ... --search_params "gpu_total_ram>=50 gpu_total_ram<60 inet_down>128 direct_port_count>3 disk_space>=50.0" --disk 50.0 ...

Customizing Template Code #

All autoscaler-compatible templates use the Vast PyWorker framework, documented here. This framework facilitates communication between worker instances and the autoscaler server and can be extended to run custom code, as described here.