MLPerf Storage, IO500, and every other storage benchmarking effort, are driven storage practitioners trying to create benchmarks for other storage practitioners that are meant to be relevant to end users of storage. However, there are no end users involved, so the resulting benchmarks wind up being out of touch and out of date with what workloads will really need.
The issue is that the way in which AI interacts with storage is relatively arbitrary. Like HPC practitioners, leading AI practitioners shape I/OS to match whatever offers the best performance. For example, a benchmark might read in data using multiple whole files because that’s what an AI framework like PyTorch does. However, if that pattern performed poorly during real model training, the AI model framework would be changed to perform its reads in whatever pattern offered the best performance
Storage benchmarks work in enterprise because enterprise applications typically do not get tailored to optimize for storage; the opposite happens, and storage vendors optimize their platforms to provide the best performance for enterprise applications. This is not true in HPC and AI. AI people know this, but storage people (generally) do not.
The reason benchmarks like MLPerf Storage are popular is because their goal isn’t to actually make AI workloads faster; it’s to allow infrastructure people to make infrastructure decisions without learning anything about the workloads that will run on them. This sounds cynical (and it is), but it’s not realistic for everyone making storage decisions to also be experts in HPC/AI.