Docs - Autoscaler


The Autoscaler is currently in Beta and may experience changes, quirks, and downtime.

Use’s Autoscaling system for a serverless solution that automates the provisioning of GPU workers to match the dynamic computational needs of your workloads. This system ensures efficient and cost-effective scaling for AI inference and other GPU computing tasks.

Key Features #

  • Dynamic Scaling: Automatically scale your AI inference up or down based on customizable performance metrics.
  • Global GPU Fleet: Leverage Vast’s global fleet of powerful, affordable GPUs for your computational needs.
  • Fast Cold-Start Times: Minimize cold-start times with a reserve pool of workers that can spin up in seconds.
  • Metrics and Debugging: Access ample metrics and debugging tools for your serverless usage, including logs and Jupyter/SSH access.
  • Performance Exploration: Perform in-depth performance exploration to optimize your autoscaling based on performance and price metrics.
  • Custom Worker Types: Define custom worker types through CLI search filters and create commands, supporting multiple worker types per endpoint.