Warning: The Autoscaler is currently in Beta, and is subject to changes, 'quirks', and downtime.
Vast.ai provides an Autoscaling system to manage serverless workers for AI Inference and any other GPU computing tasks. Autoscaling automates the provisioning of GPU workers to match the time-varying computational needs of dynamic workloads. The Autoscaler manages groups of instances - called autoscaling groups (autogroups) - to serve a horizontally scalable application by scaling up or down according to customizable usage metrics.
Efficient Inference: Dynamic load-balanced routing for cost effective inference at scale, leveraging vast's global fleet of powerful cheap GPUs.
Autoscaling: Dynamic scaling based on customizable (bring your own) application performance metrics for maximum efficiency at scale
Containers/Templates: Use any container or template - Autoscaling provides management and load-balancing layers, but Autoscaling workers are just regular GPU instances
Fast Cold-Start: To minimize cold-start times, the Autoscaler maintains a reserve pool of storage workers which can spin up in seconds (plus app/image specific model load-times, which vary)
Metrics/Debugging: Autoscaling workers are regular GPU instances and thus support all the same features: metrics, logs, jupyter/ssh access, etc
Autogroups: You can define custom worker types through CLI search filters and create commands, with multiple worker types (autogroups) per endpoint
Automatic Performance Exploration: Automate machine benchmarking and testing specific to your application to find machines with the best perf/price metrics