Warning: The Autoscaler is currently in Beta, and is subject to changes, 'quirks', and downtime.
Worker instances run the vast-pyworker code, which has been configured to support a number of backends. Currently the supported backends are "tgi" for text-generation-inference, "sdauto" for stable-diffusion-webui, and "ooba" for text-generation-webui.
Each backend has its own designated launch script and environment variables that they expect. To learn how to configure your autogroup to set these variables and launch correctly on instance creation see the READMEs for the different backends in the vast-pyworker repo:
After launch, each of these images run inference code in a separate process that is accessible as an API server, and the vast-pyworker backend code acts as a wrapper to http calls to the inference. This means each backend can support any endpoint that the underlying inference API server can support, but each endpoint requires defining a wrapper function in the appropriate backend.py file to handle it.
The process of adding an endpoint to an existing backend is described in the pyworker extension guide. The architecture described in the guide for the existing backends can also be used to write new backends that follow the model where the underlying inference code is accessible through an HTTP server in its own process.
These wrapper endpoints have the same name as the underlying endpoint, and should take the same input arguments and have the same output format. The exception to this, however, is that the authentication information returned by https://run.vast.ai/route/ must also be present in the request dictionary to the endpoint, but will be filtered out before the request is forwarded to the underlying backend server. When Vast’s autoscaling server returns a server address from the /route/ endpoint, it will provide a unique signature with your request. When the request is sent to the vast-pyworker server, the authentication server will verify that the request has been signed with the key corresponding to the autoscaling server. This is to ensure that unauthorized third parties aren’t able to send requests to your server if they were somehow able to get the address, so you can be sure that the only requests that are served by the vast-pyworker server are clients of your web application.