The Vast PyWorker is a Python web server designed to run alongside a machine learning model instance, providing autoscaler compatibility. It serves as the primary entry point for API requests, forwarding them to the model's API hosted on the same instance. Additionally, it monitors performance metrics and estimates current workload based on factors such as the number of tokens processed, reporting these metrics to the autoscaler.
Vast's autoscaler templates use the Vast PyWorker. The Vast PyWorker repository allows you to run custom code as an API server and integrate with Vast’s autoscaling server to manage server start and stop operations based on performance and error metrics. The PyWorker code runs on your Vast instance, and we automate its installation and activation during instance creation.
The Vast PyWorker wraps application-specific backend code and calls the appropriate backend function when the corresponding API endpoint is invoked. For example, if you are running a machine learning inference server, the backend code would implement the "infer" function for your model.
To use the PyWorker with a specific backend:
To integrate with Vast's autoscaling service, each backend must:
If you want to create your own backend and learn how to integrate with the autoscaling server, please refer to the following guides:
Vast has pre-created backends for popular models such as text-generation-inference and Comfy UI. These backends allow you to use these models in API mode, automatically handling performance and error tracking, making them compatible with Vast's Autoscaler with no additional code required.
To get started with Vast-supported backends, see the PyWorker Backends Guide.
For more detailed information and advanced configuration, please visit the Vast PyWorker repository.