Tensor contractions are extremely compute intensive generalized matrix multiplication operations encountered in many computational science fields, such as quantum chemistry and nuclear physics. Unlike distributed matrix multiplication, which has been extensively studied, limited work has been done in understanding distributed tensor contractions. In this paper, we characterize distributed tensor contraction algorithms on torus networks. We develop a framework with three fundamental communication operators to generate communication-efficient contraction algorithms for arbitrary tensor contractions. We show that for a given amount of memory per processor, our framework is communication optimal for all tensor contractions. We demonstrate performance and scalability of our framework on up to 262,144 cores of BG/Q supercomputer using five tensor contraction examples.
Revised: April 17, 2015 |
Published: November 16, 2014
Citation
Rajbhandari S., A. NIkam, P. Lai, K. Stock, S. Krishnamoorthy, and P. Sadayappan. 2014.A Communication-Optimal Framework for Contracting Distributed Tensors. In International Conference for High Performance Computing, Storage and Analysis (SC14), November 16-21, 2014, New Orleans, Louisiana, 375-386. Piscataway, New Jersey:IEEE.PNNL-SA-103670.doi:10.1109/SC.2014.36