Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written
in MPI without the need to intrusively modify the source code. Bamboo reformulates MPI source into the form of a task dependency graph that expresses a partial ordering among tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo's performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a wellknown
library.
Revised: October 19, 2017 |
Published: August 1, 2017
Citation
Nguyen T., P. Cicotti, E.J. Bylaska, D. Quinlan, and S.B. Baden. 2017.Automatic Translation of MPI Source into a Latency-tolerant, Data-driven Form.Journal of Parallel and Distributed Computing 106.PNNL-SA-124997.doi:10.1016/j.jpdc.2017.02.009