Warning: The Autoscaler is currently in Beta, and is subject to changes, 'quirks', and downtime.
Vast’s vast-pyworker repository allows you to run custom code as an API server, and integrate with Vast’s autoscaling server to start and stop servers based on performance and errors. The vast-pyworker code is run on your vast instance and we automate its installation and activation on instance creation.
The vast-pyworker code wraps application specific "backend" code, and is reponsible for calling the correct piece of backend code when the corresponding API endpoint is called. For example, if you are running a machine learning inference server, the backend code is responsible for implementing the "infer" function for your model. To use vast-pyworker with a specific "backend", you must use a launch script that will start the vast-pyworker code, install required dependencies for the backend code, and set up anything else that is required for your backend to run. In order to integrate with Vast's autoscaling service, each backend is reponsible for sending a message to the autoscaling server when the backend server is ready (for example, if a backend needs to install a model on start-up, it would send a message once this is complete), sending periodic updates to the autoscaling server with performance metrics (which allow the autoscaler server to optimize your set of servers so that your performance per dollar is as high as possible), as well as informing the autoscaling server if any errors have occured.
Additionally, Vast has created backends for popular images such as text-generation-inference and stable-diffusion-webui that allow you to use these images in API mode, and will automatically take care of performance and error tracking to allow them to be used with Vast's Autoscaler with no extra code required. To learn how to get started with the vast supported backends see here