Docs - Pyworker

Performance Tracking

Warning
The Autoscaler is currently in Beta and may experience changes, quirks, and downtime.

For certain backends, Vast PyWorker runs a performance test to estimate the maximum performance of a given worker. For Large Language Models (LLMs), this means determining the maximum number of tokens per second that can be generated across concurrent batches.

This performance test may take several minutes to complete, depending on the machine's specifications. Progress can be monitored through the instance logs. Once the test is completed, the results are saved. If the instance is rebooted, the saved results will be loaded, and the test will not run again.

For more detailed information and advanced configuration, please visit the Vast PyWorker repository.