Docs - Autoscaler

Autoscaler Debugging

Please note the Autoscaler is currently in Beta, and is subject to changes and downtime.

Worker errors

The vast-pyworker framework will automatically detect some errors, and other errors will cause the instanace to timeout. Once the autoscaler server has been notified that there is something wrong with an instance it will destroy it (or reboot), but if you want to manually debug an issue with an instance, you can check in the instance logs (which are accessible using the logs button on the instance page on the GUI). All errors encountered while running the backend code, as well as errors with the vast-pyworker code should be directed to this log file, so it can be helpful to check if something isn't working as expected. If looking at this doesn't solve the issue, there are dedicated logs on the instance which are found in the /home/workspace/vast-pyworker directory.

Log files

The log files are broken up according to the different functionalities of the vast-pyworker server. The log file for the backend inference server is infer.log. The log file for the vast-pyworker authorization, backend wrapper function calls, and autoscaler server performance update messages is auth.log. The log file for the monitoring of the backend inference server and autoscaler server loaded and error messages is watch.log.

Dealing with increasing load

There are a variety of strategies that can be used to plan for and deal with high load numbers on your instances. If you are creating autogroups for an endpoint group you anticipate will have a lot of load, you should set test_workers high for those autogroups, so lots of instances are created initially. In order for these instances not to be destroyed if you don't initially have much load, however, you need to set the cold_workers parameters from the endpoint group high enough to keep workers around.

In the case where you are getting high levels of load to your endpoint, and don't have many cold workers that available to be started quickly, you can increase the cold_mult parameter of your endpoint group. The autoscaler creates and destroys instances based on estimates of load further into the future, and this estimate is updated more slowly over time than the load estimates for the more near term. The cold_mult option influences how much future load the autoscaler will predict based on the current load. Increasing this under times of high load will cause the autoscaler to quickly create instances, but you might want to decrease it again once enough instances have been created. If you find your endpoint group is incapable of creating the number of workers you require to serve your endpoint, make sure the max_workers parameter of your endpoint is not too low.

Dealing with decreasing load

The autoscaler generally does a good job of stopping instances quickly when your endpoint stops experiencing high levels of load. If you are dealing with the situation where your endpoint isn't experiening much load, and you have a number of stopped instances on your account you don't want to pay for anymore, you can decrease the cold_workers parameters of the endpoint group.