April 25, 2025
Conference Paper

DaYu: Optimizing Distributed Scientific Workflows by Decoding Dataflow Semantics and Dynamics

Abstract

The combination of ever-growing scientific datasets and distributed workflow complexity creates I/O performance bottlenecks due to data volume, velocity, and variety. Although the increasing use of descriptive data formats (e.g., HDF5, netCDF) helps organize these datasets, it also creates obscure bottlenecks due to the need to translate high level operations into file addresses and then into low-level I/O operations. To address this challenge, we introduce DaYu, a method and toolset for analyzing (a) semantic relationships between logical datasets and file addresses, (b) how dataset operations translate into I/O, and (c) the combination across entire workflows. DaYu's analysis and visualization enables identification of critical bottlenecks and reasoning about remediation. We describe our methodology and propose optimization guidelines. Evaluation on scientific workflows demonstrates up to 3.7x performance improvements in I/O time for obscure bottlenecks. The time and storage overhead for DaYu's time-ordered data is typically under 0.2% of runtime and 0.25% of data volume, respectively.

Published: April 25, 2025

Citation

Tang M., J. Cernuda, J. Ye, L. Guo, N.R. Tallent, A. Kougkas, and X. Sun. 2024. DaYu: Optimizing Distributed Scientific Workflows by Decoding Dataflow Semantics and Dynamics. In IEEE International Conference on Cluster Computing (CLUSTER 2024), September 24-27, 2024, Kobe, Japan, 357-369. Piscataway, New Jersey:IEEE. PNNL-SA-201286. doi:10.1109/CLUSTER59578.2024.00038

Research topics