Docs - Pyworker

Introduction

Warning
The Autoscaler is currently in Beta and may experience changes, quirks, and downtime.

The Vast PyWorker is a Python web server designed to run alongside a machine learning model instance, providing autoscaler compatibility. It serves as the primary entry point for API requests, forwarding them to the model's API hosted on the same instance. Additionally, it monitors performance metrics and estimates current workload based on factors such as the number of tokens processed, reporting these metrics to the autoscaler.

Overview #

Vast's autoscaler templates use the Vast PyWorker. The Vast PyWorker repository allows you to run custom code as an API server and integrate with Vast’s autoscaling server to manage server start and stop operations based on performance and error metrics. The PyWorker code runs on your Vast instance, and we automate its installation and activation during instance creation.

Integration with Backend Code #

The Vast PyWorker wraps application-specific backend code and calls the appropriate backend function when the corresponding API endpoint is invoked. For example, if you are running a machine learning inference server, the backend code would implement the "infer" function for your model.

To use the PyWorker with a specific backend:

  1. Use a launch script that starts the PyWorker code.
  2. Install required dependencies for the backend code.
  3. Set up any additional requirements for your backend to run.

Communication with Autoscaler #

To integrate with Vast's autoscaling service, each backend must:

  • Send a message to the autoscaling server when the backend server is ready (e.g., after model installation).
  • Periodically send performance metrics to the autoscaling server to optimize server usage and performance.
  • Report any errors to the autoscaling server.

Getting Started #

If you want to create your own backend and learn how to integrate with the autoscaling server, please refer to the following guides:

Supported Backends #

Vast has pre-created backends for popular models such as text-generation-inference and stable-diffusion-webui. These backends allow you to use these models in API mode, automatically handling performance and error tracking, making them compatible with Vast's Autoscaler with no additional code required.

To get started with Vast-supported backends, see the PyWorker Backends Guide.

For more detailed information and advanced configuration, please visit the Vast PyWorker repository.