Blog

H100 NVL vs. SXM5: NVIDIA's Supercomputing GPUs

- Team Vast

September 11, 2024-Cloud GPU RentalsGPU Rental PlatformVast.ai

In the high-end GPU market, NVIDIA's H100 accelerator family stands out as some of the most advanced hardware available today. Whether you're training AI models or deploying them, performing scientific simulations, or doing large-scale data analytics, the H100 lineup definitely has the performance you need for enormous workloads.

But which accelerator is right for you? Today we're looking at two options: the NVIDIA H100 NVL and the H100 SXM5. Let's dive into the differences and explore which one might be the better fit for your project needs.

The H100 NVL: Built for AI Inference at Scale

Announced last year, the NVIDIA H100 NVL is an interesting variant on the H100 PCIe card. At first glance, it's exactly what it looks like: two H100 PCIe cards that come already bridged together.

This massive beast of a card spans four slots – with a TDP of 700W to match its size. Given that it's essentially just two H100s glued together, communication is handled by a pair of PCIe 5.0 x16 slots. It uses three NVLink bridges to connect the pair of H100 GPUs, which deliver 600GB/s of bidirectional bandwidth, or about 4.5x the bandwidth of the dual PCIe interfaces.

NVIDIA states that the H100 NVL is capable of 2.4x to 2.6x the performance of two separate H100s (at least when it comes to FP8 and FP16 workloads) – likely in part because it uses faster HBM3 memory rather than HBM2e. Plus, unlike the standard H100 PCIe GPUs, each card on the H100 NVL has all six stacks of HBM memory enabled instead of just five. The result is that the H100 NVL offers a 17.5% memory increase – and a total of 188GB VRAM (2 x 94GB).

A few other standout features include an HBM3 memory clock at ~5.1Gbps; 6144-bit memory bus width; and memory bandwidth of 2 x 3.9TB/s.

According to NVIDIA, the H100 NVL is ideal for inferencing large language models (LLMs):

"These GPUs work as one to deploy large language models and GPT models from anywhere from five billion parameters to 200 [billion]," explained Ian Buck, NVIDIA's VP of Accelerated Computing.

So in addition to being able to handle various high-performance computing (HPC) tasks, the H100 NVL is specifically optimized for deploying already-trained neural networks at scale.

The H100 SXM5: Heavyweight for AI Training

SXM technology is a form factor and interconnect standard primarily used for GPUs in large-scale AI applications and data center environments. Instead of relying on PCIe slots like traditional GPUs, SXM GPUs are directly socketed into the motherboard. This allows for faster, high-bandwidth connections as well as better power delivery and cooling solutions – crucial for high-end GPUs, particularly in dense server environments.

Ultimately, this setup helps ensure that SXM GPUs perform efficiently even under the most intense workloads. For instance, NVIDIA's H100 SXM5 is designed for AI training first and foremost – a powerhouse GPU that is especially well suited to training foundational models.

With 528 tensor cores, 80 GB of HBM3 memory, and a memory bandwidth of 3.35 TB/s, the SXM5 is designed for the most demanding machine learning and deep learning tasks. It also benefits from NVLink interconnects that allow up to 900 GB/s of GPU-to-GPU bandwidth, making it ideal for multi-GPU configurations in large-scale AI model training.

Like the H100 NVL, the SXM5 features HBM3 memory. However, the SXM5’s form factor is a bit more restrictive. With each unit pulling 700W, running these systems requires a lot of power and cooling, which can pose a challenge for some datacenters. Most colocation racks have a power capacity of 6-10KW, meaning systems with four or more SXM5s can push infrastructure to its limits.

(The 700W H100 NVL is comparatively easier to accommodate. A single socket, dual H100 NVL configuration – with four GH100 dies – would likely require around 2.5KW.)

A direct comparison of specs would be helpful in deciding which accelerator best suits your needs, especially when balancing power efficiency, infrastructure requirements, and the performance demands of your workloads.

Which GPU is Right for You?

Here's an outline of some of the features and specs of the H100 NVL and the SXM5, as well as the H100 PCIe for good measure:

FeatureH100 NVLH100 PCIeH100 SXM5
ArchitectureHopperHopperHopper
Memory Clock~5.1 Gbps HBM33.2 Gbps HBM2e5.23 Gbps HBM3
Memory Bus Width6144-bit5120-bit5120-bit
Memory Bandwidth2 x 3.9 TB/s2 TB/s3.35 TB/s
FP32 CUDA Cores2 x 168961459216896
Tensor Cores2 x 528456528
Boost Clock1.98 GHz1.75 GHz1.98 GHz
VRAM2 x 94 GB (188 GB)80 GB80 GB
INT8 Tensor2 x 1980 TOPS1513 TOPS1980 TOPS
FP16 Tensor2 x 990 TFLOPS756 TFLOPS990 TFLOPS
FP32 Tensor2 x 495 TFLOPS378 TFLOPS495 TFLOPS
FP64 Tensor2 x 67 TFLOPS51 TFLOPS67 TFLOPS
InterconnectNVLink 4 (600 GB/s)NVLink 4 (600 GB/s)NVLink 4, 18 Links (900 GB/s)
GPU2 x GH100GH100GH100
Transistor Count2 x 80B80B80B
TDP700-800W350W700W
Interface2 x PCIe 5.0 (Quad Slot)PCIe 5.0 (Dual Slot)SXM5

The choice between the NVIDIA H100 NVL and SXM5 ultimately depends on your specific project requirements. The NVL offers a more versatile and accessible option for various applications, while the SXM5 provides exceptional performance and scalability for demanding workloads.

If you prioritize flexibility and ease of integration, the NVL could be the better choice. However, if you need the highest possible performance and are willing to invest in a more specialized setup, the SXM5 is your ideal solution. Carefully consider your project's needs, budget, and infrastructure before making a decision.

Share on
  • Contact
  • Get in Touch