Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy in reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.
Revised: August 26, 2011 |
Published: July 27, 2011
Citation
Secchi S., A. Tumeo, and O. Villa. 2011.Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT. In 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011), May 23-26, 2011, Newport Beach, California, 275-284. Los Alamitos, California:IEEE Computer Society.PNNL-SA-76834.doi:10.1109/CCGrid.2011.39