Hello Autoscaler Guide

1) backend.py

2) launch.sh

Docs - Pyworker

Hello Autoscaler Guide

Warning: The Autoscaler is currently in Beta, and is subject to changes, 'quirks', and downtime.

To demonstrate how to use the vast-pyworker framework to set up an API and get Vast's Autoscaler to automatically manage a fleet of instances that will serve your API, we will walkthrough a minimal "hello autoscaler" example. It is reccommended that you look at the vast-pyworker helloworld example to get an idea of how to create a new backend before reading this one.

In this example we will create a new backend called "helloautoscaler", which will exist in the helloautoscaler directory. Like in "helloworld" we will walk through the two required components: backend.py and the launch_helloautoscaler.sh script.

1) backend.py #

class Backend():
    def __init__(self, container_id, master_token, control_server_url, send_data):
    
        t1 = time.time()
        self.id = container_id
        self.master_token = master_token
        self.control_server_url = control_server_url
        
        self.count = 0
        self.num_requests_recieved = 0
        self.interval_requests_recieved = 0
        
        t2 = time.time()

        data = {"id" : self.id, "mtoken" : self.master_token}
        notify.loaded(data=data, autoscaler_address=self.control_server_url, load_time=t2 - t1, max_perf=0.5)

		self.update_interval = 10
        t1 = threading.Thread(target=self.send_data_loop)
        t1.start()

First, we setup the Backend class, which now contains a lot more state then the helloworld example. Now we are accepting all the arguments passed into the Backend class initializer by server.py. These are required for identification purposes when the vast-pyworker server sends messages to the autoscaler. In order to organize our communication with the autoscaler server, we are using the notify module, which can be found here. Once our backend has finished intializing, we send the "loaded" message to the autoscaler server. control_server_url tells us the address we can reach the autoscaler at. We also provide the load_time and max_perf as arguments. Without passing the load_time, the autoscaler won't know that the server has finished loading. The max_perf is an important argument as it is used by the autoscaler to make management decisions about the amount of load this server can handle. More information about the endpoint on the autoscaler that the notify module calls can be found here. max_perf is measured in units of work per second, where an individual request can consist of multiple units of work. In this simple example, we will assume that each request takes one unit of work, so by passing a max_perf of 0.5, we are telling the autoscaler server that we expect it to handle an average of half a request per second.

def send_data_loop(self):
        while True:
            cur_load = self.interval_requests_recieved / self.update_interval
            data = {"id" : self.id, "mtoken" : self.master_token}
            notify.update(data, self.control_server_url, cur_load, self.num_requests_recieved)
            self.interval_requests_recieved = 0
            time.sleep(self.update_interval)

At the end of the Backend class, we create a background thread that runs "send_data_loop" to update the autoscaler every 10 seconds. The notify.update() method controls this request, and the two arguments we have to pass are cur_load and num_requests_recieved. The cur_load parameter is very important, and is meant to be an estimate of the amount of requested work per second the server is currently experiencing. num_requests_recieved is just a count of the total number of requests the server has recieved over its lifetime.

def track_request(self):
	self.num_requests_recieved += 1
	self.interval_requests_recieved += 1

To keep track of the number of requests recieved, we define the track_request helper function.

def increment_handler(backend, request):
    backend.track_request()
    request_dict = request.json
    if "amount" in request_dict.keys():
        backend.count += request_dict["amount"]        
        return "Incremented"
    
    abort(400)

def value_handler(backend, request):
    backend.track_request()
    return {"value" : backend.count}

flask_dict = {
    "POST" : {
        "increment" : increment_handler
    },
    "GET" : {
        "value" : value_handler
    }
}

Here we define the functionality for the two endpoints that the backend supports, "/increment" and "/value". They both use "track_request" to update the internal metrics for the request, and perform the simple operations of incrementing the count and returning the count respectively. Note that "/increment" checks for the expected keyword in the request_dict, and if it isn't present, it returns with an error code meaning "Bad Request". From within your handler functions, you can make use of flask defined functions such as abort used here.

2) launch.sh #

The launch script for the helloautoscaler backend is nearly identical to the one for the helloworld backend.

start_server() {
    if [ ! -d "$1" ]
    then
        wget -O - https://raw.githubusercontent.com/vast-ai/vast-pyworker/main/start_server.sh | bash -s $2
    else
        $1/start_server.sh $2
    fi
}

start_server /home/workspace/vast-pyworker helloautoscaler

As before, we run the start_server convenience function with the expected path of the vast-pyworker directory, and the name of the backend we want to run.