MRC (Multipath Reliable Connection) is a transport protocol developed by Microsoft, OpenAI, AMD, Broadcom, and NVIDIA.1

For now, see Ultra Ethernet. Many of the capabilities described there are actually implemented by MRC, including:

  • Packet spraying: Packets can flow across multiple different network paths between source and destination, allowing load to be balanced. This also allows packets to flow around bad links in the fabric.
  • Packet trimming: Instead of dropping a whole packet and waiting for the receiver to time out, switches can drop the payload but deliver the header. This gives devices along the network path an active signal that loss (congestion, bad links) is occurring.

I think of MRC as an implementation that pre-dates a standard.

MRC was designed to work with multi-plane trees where individual ports (e.g., 800G) are split and connected to multiple switches (e.g., 8x100G switches).2 By effectively increasing switch radix, much larger networks can be built using fewer layers.

Footnotes

  1. Building resilient networks for AI supercomputers | Microsoft Community Hub

  2. resilient-ai-supercomputer-networking-using-mrc-and-srv6.pdf