Nvidia unveils Vera Rubin: seven chips and a full platform for AI factories
Nvidia announced Vera Rubin not as just another GPU, but as a full stack for AI factories: Rubin GPU, Vera CPU, NVLink 6, ConnectX-9, BlueField-4…
AI-processed from 3DNews AI; edited by Hamidun News
Nvidia presented not just one Vera Rubin accelerator, but an entire platform for AI factories: from GPUs and CPUs to network interfaces, DPUs, storage systems, and Ethernet switches. The company presents this as the next stage after Blackwell — an infrastructure where racks and clusters are designed as a single supercomputer for agentic AI.
Full Platform
Instead of announcing another "fastest GPU," Nvidia presented a complete stack of seven chips and several types of racks that cover different stages of AI work: pretraining, posttraining, test-time scaling, and inference for agentic systems. At the center of the platform are the Rubin GPU and Vera CPU, with NVLink 6, ConnectX-9 SuperNIC, BlueField-4, Spectrum-6, and Groq 3 LPX inference accelerators built around them. According to the company's design, all of this should work not as a collection of separate servers, but as one connected computational circuit.
Nvidia specifically emphasizes a shift from individual servers to POD- and rack-scale systems. The logic is straightforward: modern models and AI agents face constraints not only in accelerators, but also in networking, memory, KV-cache storage, cooling, and power consumption. Therefore, Vera Rubin is sold not as a single chip, but as an architecture for an entire AI factory that can be assembled from ready-made modules tailored to a specific type of workload and budget.
"Vera
Rubin is a generational leap: seven breakthrough chips, five racks, and one gigantic supercomputer."
What's in the Stack
The basic Vera Rubin NVL72 configuration combines 72 Rubin GPUs and 36 Vera CPUs in a single rack. The components are connected through NVLink 6, while ConnectX-9 and BlueField-4 handle network connectivity and infrastructure task offloading. Nvidia claims that such a system trains large mixture-of-experts models using four times fewer GPUs than the Blackwell platform, and in inference provides up to 10 times more throughput per watt at ten times lower token cost. Around this rack, the company assembled several additional specialized blocks:
- Vera CPU Rack — up to 256 Vera processors for reinforcement learning and agentic workloads
- Groq 3 LPX Rack — 256 LPU chips for low-latency inference and long context
- BlueField-4 STX — storage and KV-cache processing layer for models and agents
- Spectrum-6 SPX — Ethernet rack for fast data exchange between nodes
- Quantum-X800 / Spectrum-X — scaling clusters between racks
Special emphasis was placed on the Vera CPU rack: it is designed for scenarios where agents need not just to generate a response, but to repeatedly verify action options in external environments. According to Nvidia, Vera delivers results 50% faster than traditional CPUs and is twice as energy efficient. For long-context model inference, the company added Groq 3 LPX: 256 LPUs in a rack, 128 GB of SRAM on-die, and up to 640 TB/s of internal throughput.
Economics and Scale
The most important part of the announcement is not the list of components, but the economics of operation. Nvidia promises up to 35 times higher inference throughput per megawatt when combining Vera Rubin with Groq 3 LPX, and BlueField-4 STX should accelerate KV-cache operations up to five times compared to more traditional storage architectures. For the Spectrum-6 Ethernet network, the company claims up to five times improvement in optical energy efficiency and tenfold increase in reliability when using co-packaged optics.
Alongside the hardware, Nvidia introduced the DSX platform for Vera Rubin data centers. The DSX Max-Q version, according to the company, allows deploying up to 30% more AI infrastructure in a data center with the same power budget, while DSX Flex enables using the data center's power system as a more flexible asset. Partner shipments of products based on Vera Rubin should begin in the second half of 2026.
Early partners include AWS, Google Cloud, Microsoft Azure, Oracle Cloud, CoreWeave, Lambda, Together AI, as well as Dell, HPE, Lenovo, and Supermicro.
What This Means
Nvidia is increasingly moving away from selling individual accelerators and increasingly toward the role of supplier of complete architecture for AI factories. For the market, this signals that competition will no longer be driven solely by GPU performance, but also by token price, network efficiency, memory handling, and how quickly an entire cluster for agentic AI can be deployed.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.