April 26, 2005
Conference Paper

Optimizing Performance on Linux Clusters Using Advanced Communication Protocols: Achieving Over 10 Teraflops on a 8.6 Teraflops Linpack-Rated Linux Cluster

Abstract

Advancements in high-performance networks (Quadrics, Infiniband or Myrinet) continue to improve the efficiency of modern clusters. However, the average application efficiency is as small fraction of the peak as the system’s efficiency. This paper describes techniques for optimizing application performance on Linux clusters using Remote Memory Access communication protocols. The effectiveness of these optimizations is presented in the context of an application kernel, dense matrix multiplication. The result was achieving over 10 teraflops on HP Linux cluster on which LINPACK performance is measured as 8.6 teraflops.

Revised: May 19, 2011 | Published: April 26, 2005

Citation

Krishnan M., and J. Nieplocha. 2005. Optimizing Performance on Linux Clusters Using Advanced Communication Protocols: Achieving Over 10 Teraflops on a 8.6 Teraflops Linpack-Rated Linux Cluster. In The 6th International Conference on Linux Clusters, The HPC Revolution April 25-28, 2005, Chapel Hill, North Carolina. Albuquerque, New Mexico:Linux Cluster Institute. PNNL-SA-44417.