Docs - Autoscaler

Worker Status Endpoint

Warning: The Autoscaler is currently in Beta, and is subject to changes, 'quirks', and downtime.

In order for the vast-pyworker framework to update the autoscaler with its status, it must call the /worker_status/ endpoint on the autoscaler server to provide information. There are three main types of messages that vast-pyworker will send to the autoscaler server: loaded messages, update messages, and error messages. in the vast pyworker repo provides convenience functions for sending these three types of messages.

Loaded Messages #

Loaded messsages should be sent when the server has finished loading and is ready to serve requests. Once the autoscaler has confirmed that the server is loaded, it will add it to the queue of worker instances that is used to serve the /route/ endpoint on the autoscaler. The following fields must be included to the request sent to the /worker_status/ endpoint for the loaded message to be processed:

  • loadtime: The time it took to load the model (seconds)
  • max_perf: The number of work units per second the server can generate under max load (work units per second). For example, is the context of LLMs, "work units" are tokens.

Update Messages #

Update messages are sent periodically to keep the autoscaler informed of the current level of client requests that the vast-pyworker instance is experiencing. The following fields must be included to the request sent to the /worker_status/ endpoint for the update message to be processed:

  • cur_load: The rate at which requested work units are coming in to the server (work units per second)
  • num_requests_recieved: The total number of requests recieved by the server (number of requests)

Error Messages #

Error messages are sent if an error occured on the server that will keep it from being able to function correctly. For example, if the backend code experiences an error and can no longer serve requests, an error message should be sent so that the autoscaler knows to either restart or destroy this instance, so that client requests aren't routed to it in the future. The only field that is necessary is

  • error_msg: A description of the error that has occurred.

Server Identification: #

All messages sent from the vast-pyworker to the /worker_status/ endpoint on the autoscaler must include the following two fields for indentification and authentication purposes:

  • id: This is the id of the instance that this vast-pyworker server is running on. This can be accessed through the CONTAINER_ID environment variable that is defined on every instance.

  • mtoken: When an instance is created, it is assigned a unique master token for the purposes of authentication. This token is passed into all Backend classes as an initializing argument, and can also be accessed with the MASTER_TOKEN environment variable.

Below is an example of calling the /worker_status/ endpoint with the "loaded message"

1 2 3 4 5 6 7 8 9 10 11 12 status_payload = { "id" : 1, "mtoken" : "master_token", "loadtime" : 10.0, "max_perf" : 500 } response ="", headers={"Content-Type": "application/json"}, data=json.dumps(status_payload), timeout=4) if response.status_code != 200: print(f"Failed to call /worker_status/, response.status_code: {response.status_code}") return