Docs - Pyworker

Extension Guide

Warning
The Autoscaler is currently in Beta and may experience changes, quirks, and downtime.

This guide covers the architecture of the tgi, sdauto, and ooba backends in the Vast PyWorker repository. It aims to help users modify these backends to support additional endpoints or create new backends with a similar architecture. This architecture is suitable for images where the inference code is accessible through an HTTP server in its own process, such as text-generation-inference.

For a backend that doesn't use this architecture, see the helloautoscaler backend guide.

Server Initialization #

Each backend uses a launch script to start the Vast PyWorker code, install required dependencies, and start the inference server with the necessary arguments. The start_server.sh script ensures all dependencies are installed, environment variables are set, and all component processes of the backend server are running correctly.

Launch scripts for text-generation-inference and stable-diffusion-webui can be found here and here.

Each backend requires specific environment variables, detailed in their respective README files:

These environment variables need to be set through the docker create arguments defined in the "launch_args" field of your autogroup, as described here.

Server Architecture #

Each backend has a dedicated directory containing its specific code, such as this one for tgi. Each backend directory includes the following files:

  • backend.py
  • metrics.py
  • logwatch.py

1) Backend (backend.py) #

The server.py script is launched by start_server.sh, setting up a Flask server that receives client requests and forwards them to the backend server. The custom functionality for your backend should be written in backend.py, defining the endpoints and associated handlers. Handlers must be declared in the flask_dict dictionary, with keys as endpoint routes, like here for text-generation-inference.

The custom backend class should inherit from the GenericBackend class in the top-level backend.py. The GenericBackend class provides methods for request formatting, signature checking, and handling client requests, as demonstrated in tgi/backend.py’s generate_handler.

2) Server Metrics (metrics.py) #

To handle metrics, create a metrics class in metrics.py that will be instantiated by your backend class. This class can inherit from the GenericMetrics class in the top-level metrics.py. It should manage request lifecycle events and control the metrics sent to the autoscaler server. An example for stable-diffusion-webui can be found here.

3) Log Watch (logwatch.py) #

Write a custom LogWatch class for your backend in logwatch.py, inheriting from GenericLogWatch in the top-level logwatch.py. This class should monitor logs to determine when the backend server is ready to serve requests. A minimal example for stable-diffusion-webui is available here.

Performance Test #

More extensive logwatch classes, such as tgi/logwatch.py, may include a performance test to assess server capabilities. We have a performance test for LLMs in test_model.py, used here. This test is optional.

For more detailed information and advanced configuration, please visit the Vast PyWorker repository.