Performance of distributed graph algorithms can
benefit greatly by forming rapport between algorithmic abstraction
and the underlying runtime system that is responsible for
scheduling work and exchanging messages. However, due to
their dynamic and irregular nature of computation, distributed
graph algorithms written in different programming models
impose varying degree of workload pressure on the runtime.
To cope with such vastly different workload characteristics, a
runtime has to make several trade-offs. One such trade-off
arises, for example, when the runtime scheduler has to choose
among alternatives such as whether to execute algorithmic
work, or progress the network by probing network buffers,
or throttle sending messages (termed flow control). This tradeoff
decides between optimizing the throughput of a runtime
scheduler by increasing the rate of execution of algorithmic
work, and reducing the latency of the network messages.
Another trade-off exists when a decision has to be made
about when to send aggregated messages in buffers (message
coalescing). This decision chooses between trading off latency
for network bandwidth and vice versa. At any instant, such
trade-offs emphasize either on improving the quantity of work
being executed (by maximizing the scheduler throughput) or
on improving the quality of work (by prioritizing better work).
However, encoding static policies for different runtime features
(such as flow control, coalescing) can prevent graph algorithms
from achieving their full potential, thus can undermine the
actual performance of a distributed graph algorithm . In this
paper, we investigate runtime support for distributed graph
algorithms in the context of two paradigms: variants of wellknown
Bulk-Synchronous Parallel model and asynchronous
programming model. We explore generic runtime features
such as message coalescing (aggregation) and flow control
and show that execution policies of these features need to be
adjusted over time to make a positive impact on the execution
time of a distributed graph algorithm. Since synchronous
and asynchronous graph algorithms have different workload
characteristics, not all of such runtime features may be good
candidates for adaptation. Each of these algorithmic paradigms
may require different set of features to be adapted over time.
We demonstrate which set of feature(s) can be useful in each
case to achieve the right balance of work in the runtime
layer. Existing implementation of different graph algorithms
can benefit from adapting dynamic policies in the underlying
runtime.
Revised: May 22, 2019 |
Published: December 17, 2018
Citation
Firoz J.S., M.J. Zalewski, J.D. Suetterlein, and A. Lumsdaine. 2018.Adaptive Runtime Features For Distributed Graph Algorithms. In IEEE 25th International Conference on High Performance Computing (HiPC 2018), December 17-20. 2018, Bengaluru, India, 82-91. Los Alamitos, California:IEEE Computer Society.PNNL-SA-138864.doi:10.1109/HiPC.2018.00018