MarkTechPost→ original

OpenAI unveils the MRC protocol for supercomputer networks with millions of GPUs

OpenAI, together with AMD, Broadcom, Intel, Microsoft, and NVIDIA, developed the MRC protocol for AI supercomputer networks. It distributes packets across hundr

OpenAI unveils the MRC protocol for supercomputer networks with millions of GPUs
Source: MarkTechPost. Collage: Hamidun News.
◐ Listen to article

OpenAI, in collaboration with AMD, Broadcom, Intel, Microsoft, and NVIDIA, presented MRC (Multipath Reliable Connection) — a new open network protocol for large AI clusters. The protocol solves a key problem: how to build a supercomputer with hundreds of thousands of GPUs when network reliability becomes a bottleneck.

How MRC Works

MRC distributes data packets across hundreds of network paths simultaneously. This means that if one path fails, data travels along alternative routes without losing speed. Recovery from failures occurs in microseconds — so fast that neural network training barely notices disruptions. Traditional network protocols select one primary path and switch to a backup only if the primary fails. MRC works on a fundamentally different principle: it monitors hundreds of potential paths in real time and dynamically distributes the load across healthy channels. This is like the difference between a single road with a detour and a network of alleys where cars can travel along any free route.

Practical Advantages

MRC's main achievement is simplifying supercomputer architecture. Previously, clusters with 100,000+ GPUs required a three-tier hierarchy of Ethernet switches. Expensive, complex to assemble, and requires significant energy for cooling. MRC allows for just two levels, which radically simplifies the design and reduces equipment costs.

  • Less network equipment — simplified assembly and maintenance
  • Reduced network latency thanks to more direct paths between GPUs
  • Reduced power consumption for switch cooling
  • Better scalability — architecture works up to millions of GPUs

Open Standard for the Ecosystem

MRC is not a closed solution from one company. OpenAI chose an open approach and engaged the leading network equipment manufacturers: AMD, Broadcom, Intel, Microsoft, NVIDIA. This means other companies, cloud providers, and research centers will be able to implement MRC in their supercomputers. The openness of the standard is important precisely because at these scales, even small improvements in network reliability and efficiency affect the cost of training models and the speed of development across the entire AI industry.

What This Means

MRC is an answer to the challenge of scale. As AI models grow, so do the demands on computational infrastructure. Network architecture that works for a cluster of 10,000 GPUs can become a bottleneck at 500,000 GPUs. MRC enables building even larger supercomputers without a radical rethinking of the architecture. For the industry, this means cheaper model training and faster innovation deployment.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…