Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require good initial load balance coupled with efficient periodic rebalancing. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. These are shown to incur low overheads and achieve over 90% efficiency on 76,800 cores.
Revised: February 19, 2016 |
Published: June 18, 2012
Citation
Lifflander J., S. Krishnamoorthy, and L. Kale. 2012.Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications. In HPDC 2012: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, June 18-22, 2012, Delft, The Netherlands, 137-148. New York, New York:Association for Computing Machinery (ACM).PNNL-SA-86555.doi:10.1145/2287076.2287103