May 3, 2006
Conference Paper

Memory Efficient Parallel Matrix Multiplication Operation for Irregular Problems

Abstract

Regular distributions for storing dense matrices on parallel systems are not always used in practice. In many scientific applications, matrix distribution is based on the underlying physical problem and might involve variable block sizes on individual processors. This paper describes a generalization of the Shared and Remote-memory based Universal Matrix Multiplication Algorithm (SRUMMA) [1] to handle irregularly distributed matrices. Our approach relies on a distribution independent algorithm that provides dynamic load balancing by exploiting data locality and achieves performance as good as the traditional approach which relies on temporary arrays with regular distribution, data redistribution, and matrix multiplication for regular matrices to handle the irregular case. The proposed algorithm is memory-efficient because temporary matrices are not needed. This feature is critical for systems like the IBM Blue Gene/L that offer very limited amount of memory per node. The experimental results demonstrate very good performance across the range matrix distributions and problem sizes motivated by real applications.

Revised: April 20, 2011 | Published: May 3, 2006

Citation

Krishnan M., and J. Nieplocha. 2006. Memory Efficient Parallel Matrix Multiplication Operation for Irregular Problems. In Proceedings of the 3rd Conference on Computing Frontiers CF '06, Ischia, Italy May 03 - 05, 2006, 229-240. New York, New York:ACM Press. PNNL-SA-47061.