NVIDIA Rubin Platform: Everything We Know So Far

June 29, 2026
4 Min Read
By Team Vast

NVIDIA has officially unveiled its next-generation AI platform, Rubin - and it's clearly aimed at shifting the economics of large-scale AI.

Bringing together six new chips to slash training time and inference token costs, Rubin "takes a giant leap toward the next frontier of AI," says NVIDIA CEO Jensen Huang.

Here's what we know about the platform - and what's still speculation.

Rubin Platform: AI's Next Leap Forward

The NVIDIA Rubin platform takes a whole new approach to supercomputing.

The reality today is that always-on, next-generation AI factories have to continuously sustain enormous token throughput across distributed systems - providing the long context required for agentic reasoning, complex workflows, and multimodal pipelines - without slowing down under constraints on power and cost.

Scaling these workloads is all about eliminating bottlenecks. Rubin is purposefully designed for that task. Its six-chip, rack-scale architecture is built around the idea that the datacenter itself, rather than a single GPU server, is the unit of compute.

To that end, Rubin uses extreme co-design as a foundational principle. GPUs, CPUs, networking, security, software, power delivery, and cooling are all designed together as a single system instead of being optimized in isolation.

At the chip level, the platform brings together:

  1. NVIDIA Rubin GPU - The primary AI accelerator, featuring a third-generation Transformer engine with hardware-level adaptive compression.
  2. NVIDIA Vera CPU - An Arm-based processor with 88 NVIDIA Olympus cores and ultra-fast NVLink-C2C connectivity, built for high-bandwidth data movement and agentic reasoning across accelerated systems.
  3. Sixth-gen NVLink + NVLink Switch - A high-speed GPU interconnect that unifies massive Mixture-of-Experts (MoE) models and enables faster, more efficient AI inference at scale.
  4. ConnectX-9 SuperNIC - Advanced networking designed to move data efficiently between nodes and across clusters.
  5. BlueField-4 DPU - Infrastructure offload and security processing that enables confidential computing and workload isolation at scale.
  6. Spectrum-6 Ethernet - Next-gen AI networking built to sustain high-throughput, low-latency communication across large deployments.

It's an impressive combination. And in terms of performance, NVIDIA's published figures reflect just how powerful the platform is.

NVIDIA Rubin GPU Specs and Performance Benchmarks

According to NVIDIA, the Rubin platform delivers up to 50 petaFLOPS of NVFP4 compute for AI inference. Its high-speed GPU interconnect delivers 3.6 TB/s per GPU via sixth-gen NVLink - and scales up to 260 TB/s per rack of 72 Rubin GPUs.

The platform's flagship innovation is the Vera Rubin NVL72 rack-scale system, where the entire rack serves as one accelerator within a larger AI factory.

Notably, Vera Rubin NVL72 features NVIDIA Confidential Computing - the first rack-scale platform ever to do so. This maintains data security across CPU, GPU, and NVLink domains and protects the world's largest proprietary models, along with the training data and inference workloads that power them.

As context windows grow, so does the need to coordinate state across systems at gigascale. To address this issue, Rubin introduces a new inference context storage layer, with BlueField-4 enabling efficient sharing and reuse of key-value cache data across the infrastructure.

On top of that, the NVIDIA Rubin platform's second-generation RAS (Reliability, Availability, and Serviceability) engine features real-time health checks, fault tolerance, and proactive maintenance. Maximizing system productivity is the goal, right down to tray design. The rack is modular and cable-free, enabling up to 18x faster assembly and servicing than the Blackwell generation.

Beyond the Official Specs

Of course, there are always rumors in the run-up to launch. How far will NVIDIA push the envelope?

Unofficial reports suggest that Rubin's TDP may be an astonishing 2.3 kW per GPU - up from the previously reported 1.8 kW. At the same time, memory bandwidth will jump to 22 TB/s per GPU, potentially leveraging more aggressive HBM4 clocks.

That extra 500W of headroom could mean higher sustained boost clocks under continuous load and reduced throttling during long training runs. It could also result in more stable throughput for inference-heavy deployments, as well as higher rack-level performance density.

A power increase of this magnitude would ultimately enable more consistent performance under full stress. At hyperscale, that makes a huge difference.

Conclusion

The NVIDIA Rubin platform shows what the next generation of AI factories will look like: rack-scale systems optimized for bandwidth, efficiency, and always-on sustained load.

But most teams don't operate their own AI factories. They need flexible and cost-effective access to high-performance GPUs today, across generations, without long-term commitments.

That's what Vast.ai was built for. Get instant access to H200s, B200s, and B300s now, with pay-as-you-go pricing - and scale as your needs evolve.