Model FLOPs Utilization

Model FLOPs Utilization is a metric proposed by Google that describes how well utilized a GPU is during model training.¹ They define it as

From the PaLM paper

This is the ratio of the observed throughput (tokens-per-second) relative to the theoretical maximum throughput of a system operating at peak FLOPs. Crucially, the “theoretical maximum” throughput only accounts for the required operations to compute the forward+backward passes, and not rematerialization.

Meta defines it very similarly but as “the number of FLOPs a model theoretically utilizes compared to hardware peak FLOPs.”²

It is the AI version of arithmetic intensity which is an essential component of the roofline model.

In practice

Llama-3.1 reported an MFU of 38-43% during training.³

PaLM: Scaling Language Modeling with Pathways ↩
[2410.21680v1] Revisiting Reliability in Large-Scale Machine Learning Research Clusters ↩
The Llama-3 Herd of Models (arxiv.org) ↩

Glenn's Digital Garden

Explorer

Model FLOPs Utilization

In practice

Graph View

Backlinks

Glenn's Digital Garden

Explorer

Model FLOPs Utilization

In practice

Footnotes

Graph View

Backlinks