Warning: The Autoscaler is currently in Beta, and is subject to changes, 'quirks', and downtime.
For certain backends, vast-pyworker will run a performance test to estimate the max performance of a given worker. For LLMs this the max number of tokens per second it can generate across concurrent batches. This test might take a number of minutes to complete, depending on the machine. You can see the progress made on this in the instance logs. Once the test is completed, the results are saved, and if the instance is rebooted the saved results will be loaded, and the test is not run again.