For all backend systems, the Vast PyWorker conducts a performance test to assess the maximum capabilities of a given worker. Specifically, for Large Language Models (LLMs), this test measures the maximum number of tokens that can be generated per second across concurrent batches. When it comes to image generation, the test counts the number of 512x512 grids required to cover the image resolution, considering each grid as equivalent to 175 tokens. This value is then adjusted so that a system running Flux on a 4090 GPU achieves a standardized performance rating of 200 tokens per second.
This performance test may take several minutes to complete, depending on the machine's specifications. Progress can be monitored through the instance logs. Once the test is completed, the results are saved. If the instance is rebooted, the saved results will be loaded, and the test will not run again.
For more detailed information and advanced configuration, please visit the Vast PyWorker repository.