NVIDIA Rubin CPX

Rubin CPX is a GPU announced at the 2025 AI Infrastructure Summit that is “specifically optimized for context processing.” It swaps out HBM for GDDR7 and increases the NVFP4 performance over Rubin by 20%. From the announcement, it has:

30 PF NVFP4 (assume sparse)
“3x Exponent Operations” - which are “attention acceleration cores” - compared to
128 GB GDDR7 instead of HBM - because prefill is compute-limited, not memory bandwidth limited like decode.
4 NVENC and 4 NVDEC encoders/decoders for processing and generating AI video

It is to be released at the end of 2026 as a fast follow-on to the R200 launch.

Platforms

The following slide from Ian Buck summarizes the two ways in which NVIDIA will ship CPX:¹

“Vera Rubin NVL144 CPX”

There will be a new VR200 NVL144 tray (“VR NVL 144”) which incorporates 8x CPX GPUs in addition to the 8x Rubin GPUs in each tray:

Feature	VR144-only	VR144 with CPX
NVFP4 FLOPS	3.6 EF	8.0 EF
Memory Bandwidth	1.4 PB/s	1.7 PB/s
”Fast memory”	75 TB	150 TB
Network	8x ConnectX-9	8x ConnectX-9

In one of these V

now 8 EF NVFP4, 1.7 PB/s memory, 100 TB “fast memory”

4.3 EF NVFP4 comes from 144x 30 PF NVFP4 from CPX
3.6 EF NVFP4 comes from 144x 25 PF NVFP4 from Rubin

”Vera Rubin CPX Dual Rack”

NVIDIA will also make a CPX-only tray (VR-CPX) with 8x CPX. From the slide above, it doesn’t look like the CPX trays have scale-up NVLink connectivity. This implies that the non-CPX nodes will have to fetch KV caches from CPX nodes over InfiniBand.

Target applications

Ian suggested the use case would be…

Perform prefill on CPX nodes in one part of a datacenter.
As soon as the first token is ready, ship all keys and values for the prompt to a non-CPX node (presumably via InfiniBand, as these CPX nodes are not on the same NVLink domain as the HBM nodes)
HBM GPUs begin decode using the computed key and value vectors.

Industries

NVIDIA boasted a few AI companies as launch partners:

Code generation: Cursor, Magic
Inferencing-as-a-Service platforms: Fireworks AI, and together.ai are trial customers, citing the need for huge context windows (1M-100M tokens) to ingest entire codebases for code generation applications.
Creative/media generation: Runway

Pricing?

I’m not sure what pricing to expect for this GPU. It doesn’t have HBM which is a significant cost/complexity savings, but prefill is often the most expensive part of LLM inferencing. Reducing the time to inferencing (and therefore inferencing throughput) seems like a high-value proposition that reduces the number of expensive Rubin HBM GPUs required.

Perhaps because prefill is only the dominant cost for very large prompts though, its overall value would be reduced.

https://www.nvidia.com/en-us/events/ai-infra-summit/ ↩

Glenn's Digital Garden

Explorer

NVIDIA Rubin CPX

Platforms

“Vera Rubin NVL144 CPX”

”Vera Rubin CPX Dual Rack”

Target applications

Industries

Pricing?

Graph View

Table of Contents

Glenn's Digital Garden

Explorer

NVIDIA Rubin CPX

Platforms

“Vera Rubin NVL144 CPX”

”Vera Rubin CPX Dual Rack”

Target applications

Industries

Pricing?

Footnotes

Graph View

Table of Contents