
Vast.ai is thrilled to announce the launch of our Serverless offering for GPU workloads, enabling automatic, pay-per-use scaling for AI inference without hyperscaler pricing or fixed infrastructure constraints.
With Vast.ai Serverless, users run inference workloads through a fully serverless API on Vast's globally distributed GPU cloud. That means no manual instance management and no capacity planning. Instead, the platform automatically applies predictive optimization and flexible scaling, with access to a diverse GPU fleet -- so teams can deploy production AI systems that scale when needed and yet stay cost-efficient at all times.
How does this serverless model work in practice? Let's take a closer look!
Vast.ai Serverless provides access to a wide range of GPUs, from RTX-class consumer GPUs to enterprise-grade machines like A100s, H100s, and B200s. Rather than managing individual instances, all you have to do is define performance targets, and Vast handles the rest.
Behind the scenes, Vast.ai continuously benchmarks GPUs across a global network of more than 17,000 GPUs hosted by 1,400+ providers in over 500 locations worldwide. As workloads run, the platform dynamically selects, provisions, and routes jobs to the most efficient hardware available at that moment.
The key is predictive optimization.
Most serverless platforms react to demand after it appears. Vast.ai Serverless goes further.
Our predictive optimization feature analyzes historical usage patterns, real-time load, and ongoing market benchmarking to anticipate demand before it peaks. Based on those signals, the platform proactively provisions GPU workers that balance cost and latency -- ready to activate on demand rather than scrambling to scale once performance has already degraded.
These reserve workers help avoid the laggy cold starts and unpredictable costs common with other GPU serverless offerings. Not to mention having to pay for excess GPU capacity that isn't actively being used.
Unlike traditional serverless systems that restrict endpoints to a single hardware profile, Vast.ai Serverless supports multiple Workergroups per Endpoint. A Workergroup is a collection of GPU workers that are managed as a logical unit to automatically adjust capacity to meet demand.
In our Serverless offering, each Workergroup in an Endpoint can specify different GPU types or hardware configurations. This lets a single API endpoint serve workloads using whichever Workergroup is most cost-effective or the best fit for your performance targets. For example, lighter requests might route to consumer GPUs, while heavier inference jobs scale onto H100s -- with no manual intervention needed.
In practice, this makes it possible to optimize performance and cost-efficiency in real time within a single deployment. And that brings us to pricing.
Vast.ai Serverless is the lowest-cost autoscaling GPU cloud on the market today.
Our Serverless workloads are billed per second, with support for On-Demand, Interruptible, and Reserved pricing. There are no tiers and no limits -- and all you need is $5 to get started.
Because Vast draws from a competitive global market of GPU providers, pricing reflects real supply and demand rather than preset SKUs. As demand scales up, Vast.ai Serverless automatically selects the most cost-effective GPUs available. When demand drops, resources are released immediately, and billing stops.
As a result, teams can run production workloads at a fraction of the cost compared to traditional providers with centralized infrastructure. However, lower costs don't mean lower standards. Security and compliance remain core to our platform.
Vast.ai is built with security and compliance as foundational principles. Our platform is backed by SOC 2 Type II certification, with ongoing audits every 12 months to ensure continual coverage.
For customers with the highest security requirements, Vast.ai offers a tiered compliance structure with our Secure Cloud offering. In the Secure Cloud, workloads run on isolated instances with direct SSH, CLI, and API access, hosted by our vetted datacenter partners that meet ISO 27001 standards at minimum. Many partners maintain additional certifications. (Full details are available on our compliance page.)
Other enterprise security features -- including private VPN access, optional audit trails, and enterprise-grade compliance support -- can be enabled as needed.
In all cases, data sovereignty remains fully in your control. Models, data, and workloads persist only as long as you choose and can be deleted at any time.
With Vast.ai Serverless, you can go from zero to compute in seconds. In short, here's how the process works:
That's all there is to it! From request to response, Vast.ai handles all of the heavy lifting. You also gain access to ample metrics and debugging tools, including logs and Jupyter/SSH.
Our Serverless offering makes it easy to run AI inference without managing infrastructure or overpaying for capacity. Vast.ai Serverless gives you a simple and straightforward path from experiment to production -- and it's ready when you are.
Leverage Vast's distributed GPU fleet on your own terms. Check out our Serverless Overview and start building today!