August 1, 2025
Conference Paper

Understanding and predicting cross-application I/O interference in HPC storage systems

Abstract

On High-Performance Computing (HPC) systems, where multiple concurrent workloads may read and write vast amounts of data stored through a shared network on storage servers, competition for I/O resources between workloads is inevitable. Previous work has thoroughly recognized the impact of this type of resource contention, highlighting its potential to impact the performance of individual applications significantly. However, no prior work on such an issue has quantitatively investigated the impact of inter-application I/O contention on individual applications, impeding a more efficient resource provision strategy. In this work, we propose a framework for collecting fine-grained I/O trace information from applications and concurrent server-side metrics to train a neural network to accurately predict the existence of I/O interference and its potential impacts. Our results show that it is feasible to learn the complex factors and their relationships in creating I/O interference. In addition, we show the trained model can accurately predict I/O interference in HPC systems. We show the F1 scores of the predictions are higher than 90% for both synthetic benchmarks and real-world applications. We believe the proposed framework can be an important service in the future HPC system for handling I/O interference efficiently.

Published: August 1, 2025

Citation

Egersdoerfer C., M. Rashid, D. Dai, B. Fang, and N.R. Tallent. 2024. Understanding and predicting cross-application I/O interference in HPC storage systems. In Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2024-W), November 17-22, 2024, Atlanta, GA, 1330-1339. Piscataway, New Jersey:IEEE. PNNL-SA-198896. doi:10.1109/SCW63240.2024.00174

Research topics