May 31, 2012
Conference Paper

A Bandwidth-Optimized Multi-Core Architecture for Irregular Applications

Abstract

This paper presents an architecture template for next-generation high performance computing systems specifically targeted to irregular applications. We start our work by considering that future generation interconnection and memory bandwidth full-system numbers are expected to grow by a factor of 10. In order to keep up with such a communication capacity, while still resorting to fine-grained multithreading as the main way to tolerate unpredictable memory access latencies of irregular applications, we show how overall performance scaling can benefit from the multi-core paradigm. At the same time, we also show how such an architecture template must be coupled with specific techniques in order to optimize bandwidth utilization and achieve the maximum scalability. We propose a technique based on memory references aggregation, together with the related hardware implementation, as one of such optimization techniques. We explore the proposed architecture template by focusing on the Cray XMT architecture and, using a dedicated simulation infrastructure, validate the performance of our template with two typical irregular applications. Our experimental results prove the benefits provided by both the multi-core approach and the bandwidth optimization reference aggregation technique.

Revised: September 3, 2013 | Published: May 31, 2012

Citation

Secchi S., A. Tumeo, and O. Villa. 2012. A Bandwidth-Optimized Multi-Core Architecture for Irregular Applications. In 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), May 13-16, 2012, Ottawa, Ontario, Canada, 580-587. Los Alamitos, California:IEEE Computer Society. PNNL-SA-79924. doi:10.1109/CCGrid.2012.53