June 22, 2021
Conference Paper

Effectively Using Remote I/O For Work Composition in Distributed Workflows

Abstract

Distributed scientific workflows are becoming more important with the interest in incorporating AI into their loops. A critical programming and performance question is how to compose workflow tasks when data is produced on one system but must be consumed on another. Since the dominant technique is composition with remote I/O, this paper explores its performance expectations. We describe BigFlowSim, a workflow I/O simulator that captures key implementation choices for remote I/O, including intensity, reuse, locality, access pattern, and data movement.With BigFlowSim, we generate a synthetic benchmark. We quantify the effects of each parameter with a performance sensitivity study. We explain trends in terms of data movement reduction and show that, under certain conditions, it is possible to establish a total order among most parameters. We apply these insights to a high energy physics workflow, Belle II Monte Carlo and simulate several I/O optimizations. Speedups range from 5% to 2×, without changing compute time.

Published: June 22, 2021

Citation

Friese R.D., B. Mutlu, N.R. Tallent, J.D. Suetterlein, and J.F. Strube. 2020. Effectively Using Remote I/O For Work Composition in Distributed Workflows. In IEEE International Conference on Big Data (Big Data 2020), December 10-13, 2020, Atlanta, GA, 426-433. Piscataway, New Jersey:IEEE. PNNL-SA-155757. doi:10.1109/BigData50022.2020.9378352