In the graph data structure like adjacency matrix, the connectivity of two nodes can be sufficiently represented using only 1 bit, but they are generally treated as 32-bit full-precision in state-of-the-art graph frameworks to adopt common sparse format such as CSR. Meanwhile, bit-level parallelism has recently be explored to have high-performance potential and low storage requirement on GPUs with dense bit-tiles. To fill the gap, our solution is a hierarchical storage format that contains the bit-indexing base and dense bit-tile units. Inherently, the granularity of the bit-tile is an essential factor in achieving both storage compression and GPU parallelism. How to find a sweet spot that trades off between avoiding sparsity and exploiting is comprehensively researched in this work. In the experiment, we evaluate the proposed storage format and algorithms on modern generation GPUs, including Pascal and Volta, to figure out critical software co-designs in conjunction with existing hardware-specific optimization.
Published: September 24, 2022
Citation
Chen J., N.R. Tallent, K.J. Barker, X. Shen, H. Sung, and A. Li. 2022.Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU. In IEEE International Parallel and Distributed Processing Symposium (IPDPS 2022), May 30-June 03, 2022, Virtual, Online, 515-525. Los Alamitos, California:IEEE Computer Society.PNNL-SA-161317.doi:10.1109/IPDPS53621.2022.00056