Load-balanced sparse MTTKRP on GPUs

April 16, 2022

Conference Paper

Load-balanced sparse MTTKRP on GPUs

Abstract

Sparse matricized tensor times Khatri-Rao product (MT- TKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP for floating point operations, storage, and scalability. We begin by identifying the performance bottlenecks in directly extending the state-of-the-art CSF (compressed sparse fiber) formats from CPUs to GPUs. Our detailed analysis over the recently proposed formats shows that the lower bounds on storage and flop counts can vary significantly depending on the structure of the sparse tensor. To address this, we propose a load balanced, computation and storage-efficient scheme, HYB, which combines the best of COO (coordinate), CSF and CSL (compressed slice). With these enhancements, our GPU framework significantly out- performs the current formats on both CPU and GPU platforms.

Published: April 16, 2022

Citation

Nisa I., J. Li, A. Sukumaran-Rajan, R. Vuduc, and P. Sadayappan. 2019. Load-balanced sparse MTTKRP on GPUs. In IEEE 33rd International Parallel and Distributed Processing Symposium (IPDPS 2019), May 20-24, 2019, Rio de Janeiro, Brazil, 123-133. Los Alamitos, California:IEEE Computer Society. PNNL-SA-138752. doi:10.1109/IPDPS.2019.00023