Docs - Endpoints

Worker Status

Warning
The Autoscaler is currently in Beta and may experience changes, quirks, and downtime.

To update the autoscaler with its status, the vast-pyworker framework must call the /worker_status/ endpoint on the autoscaler server. There are three main types of messages that vast-pyworker will send to the autoscaler server: loaded messages, update messages, and error messages. notify.py in the vast-pyworker repository provides convenience functions for sending these three types of messages.

Loaded Messages #

Loaded messages should be sent when the server has finished loading and is ready to serve requests. Once the autoscaler confirms that the server is loaded, it will add it to the queue of worker instances used to serve the /route/ endpoint on the autoscaler. The following fields must be included in the request sent to the /worker_status/ endpoint for the loaded message to be processed:

  • loadtime: The time it took to load the model (seconds)
  • max_perf: The number of work units per second the server can generate under max load (work units per second). For example, in the context of LLMs, "work units" are tokens.

Update Messages #

Update messages are sent periodically to keep the autoscaler informed of the current level of client requests that the vast-pyworker instance is experiencing. The following fields must be included in the request sent to the /worker_status/ endpoint for the update message to be processed:

  • cur_load: The rate at which requested work units are coming into the server (work units per second)
  • num_requests_recieved: The total number of requests received by the server (number of requests)

Error Messages #

Error messages are sent if an error occurs on the server that prevents it from functioning correctly. For example, if the backend code experiences an error and can no longer serve requests, an error message should be sent so that the autoscaler knows to either restart or destroy this instance, preventing client requests from being routed to it in the future. The only field necessary is:

  • error_msg: A description of the error that has occurred.

Server Identification #

All messages sent from the vast-pyworker to the /worker_status/ endpoint on the autoscaler must include the following two fields for identification and authentication purposes:

  • id: The ID of the instance that this vast-pyworker server is running on. This can be accessed through the CONTAINER_ID environment variable defined on every instance.
  • mtoken: When an instance is created, it is assigned a unique master token for authentication purposes. This token is passed into all Backend classes as an initializing argument and can also be accessed with the MASTER_TOKEN environment variable.

Example: Calling the /worker_status/ Endpoint with a Loaded Message #

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import requests import json status_payload = { "id": 1, "mtoken": "master_token", "loadtime": 10.0, "max_perf": 500 } response = requests.post( "https://run.vast.ai/worker_status/", headers={"Content-Type": "application/json"}, data=json.dumps(status_payload), timeout=4 ) if response.status_code != 200: print(f"Failed to call /worker_status/, response.status_code: {response.status_code}") return