NVIDIA GB200: Exascale Computing in a Rack through Intelligent Task Scheduling
NVIDIA released a methodology for maximizing GB200 NVL72: using Slurm's topology-aware scheduling, a single rack achieves exascale computing for trillion-parame

Large-scale AI models require enormous computational resources, and it turns out that infrastructure efficiency depends not only on hardware but also on how workloads are placed. NVIDIA has released a detailed guide on using GB200 NVL72 with the Slurm scheduler, which takes network topology into account for optimal distribution of computations across distributed clusters.
Exascale
Machine in a Single Rack NVIDIA GB200 NVL72 is a system that packs exascale (10^18 FLOP/sec) computations into a single rack. Such power enables running real-time trillion-parameter AI models, which previously required an entire data center. However, achieving the stated performance is only possible if tasks are arranged correctly — that is, the physical network topology between nodes within the rack is considered. Poor workload distribution can reduce bandwidth and negate all the advantages of the hardware.
Topology
Solves Half the Problem When multiple GPU accelerators work together, communication time between them becomes a critical factor. If a task is distributed across nodes that are physically remote from each other in the network hierarchy, latencies grow exponentially, and the entire potential of the hardware is simply wasted. This is where Slurm (Simple Linux Utility for Resource Management) comes in — the standard scheduler in HPC clusters, which now has topology-aware scheduling support.
This means that Slurm can: See the complete map of physical network topology between all nodes Place computational workload so that nodes exchanging data are close to each other Account for different levels of hierarchy (high-speed intra-rack connectivity vs. inter-rack channels) Automatically optimize distribution of multi-node tasks without human intervention * Minimize conflicts over network resources between parallel jobs ## How It Works in Practice For engineers working with trillion-parameter models, this represents revolutionary simplification. Instead of manually optimizing task placement for each job, a specialist simply submits it to Slurm — the scheduler itself chooses the best configuration based on current topology and load.
NVIDIA demonstrates concrete results on GB200 NVL72: with proper topology-aware scheduling, the system achieves stated exascale performance with full utilization of inter-node bandwidth. Without this optimization, performance drops by 30-50%, and the cluster operates in the mode of an expensive teaching test bed.
The full power of infrastructure is unlocked not so much through
buying more chips, but through a smart algorithm for distributing tasks across existing hardware.
What This Means The era when it was enough to buy more equipment and launch training is ending.
Those training very large models in distributed clusters now need to think about topology and scheduling as carefully as about GPUs and memory themselves. Slurm with topology support is becoming a mandatory part of the engineering stack for serious AI clusters, whether corporate data centers or cloud providers.