Ozaki scheme

The Ozaki scheme is a family of algorithmic methods that emulate high-precision matrix-matrix multiplications using low-precision arithmetic units like tensor cores. It is being promoted as a way that the modeling/simulation community can still use GPUs developed for AI despite them no longer having robust hardware FP64 support.

NVIDIA promotes Ozaki heavily, but AMD is less bullish on it due to the significant memory overheads of these emulation techniques.¹ HBM is a precious resource because many HPC applications are already limited by memory capacity, and relying on Ozaki pushes those memory-bound problems in the wrong direction.

As a substitute for FFTs

NVIDIA developed an an alternative to FFT-based direct numerical simulation (of incompressible fluids) that replaces the Fourier transform with emulated DGEMMs.² It still requires the same all-to-all communications of 3D FFTs, but it does not rely on strictly uniform grids.

Post | LinkedIn ↩
[2603.09528] A GEMM-based direct solver for finite-difference Poisson problems in non-uniform grids ↩

Glenn's Digital Garden

Explorer

Ozaki scheme

As a substitute for FFTs

Graph View

Glenn's Digital Garden

Explorer

Ozaki scheme

As a substitute for FFTs

Footnotes

Graph View