I was invited to speak at ModSim 2026.

Title: A Demand-Side Approach to Modeling Data Path Requirements for Training

Abstract: Conventional wisdom holds that the I/O performance required to train frontier models is driven by the need to load training data as quickly as the GPUs can process it. However, this supply-side approach overestimates the bandwidth required for efficient training because it ignores how locality and computational intensity decouple data path demand from GPU throughput in practice. In this talk, we present the demand side of the design equation and critically examine what production training at scale actually demands of I/O infrastructure. We present the results of analyzing 85,000 checkpoints, data loads, and restarts across 40 production training runs to characterize how frontier models actually interact with their data infrastructure throughout training. This analysis produces a simple framework for sizing the training data path from first principles, and it reveals that failure rate, not compute throughput, is the most critical input.