We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
Revised: March 19, 2010 |
Published: May 29, 2009
Citation
Madduri K., K. Madduri, D. Ediger, D. Ediger, K. Jiang, D.A. Bader, and D.A. Bader, et al. 2009.A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), May 23-29, 2009, Rome, Italy. Los Alamitos, California:IEEE Computer Society.PNNL-SA-64069.