The Cray Gemini Interconnect has been recently introduced as a next generation network architecture for building multi-petaflop supercomputers. Cray XE6 systems including LANL Cielo, NERSC Hopper, ORNL Titan and proposed NCSA BlueWaters leverage the Gemini Interconnect as their primary Interconnection network. At the same time, programming models such as the Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) and Co-Array Fortran (CAF) have become available on these systems. Global Arrays is a popular PGAS model used in a variety of application domains including hydrodynamics, chemistry and visualization. Global Arrays uses Aggregate Re- mote Memory Copy Interface (ARMCI) as the communication runtime system for Remote Memory Access communication. This paper presents a design, implementation and performance evaluation of scalable and high performance communication subsystems on Cray Gemini Interconnect using ARMCI. The design space is explored and time-space complexities of commu- nication protocols for one-sided communication primitives such as contiguous and uniformly non-contiguous datatypes, atomic memory operations (AMOs) and memory synchronization is presented. An implementation of the proposed design (referred as ARMCI-Gemini) demonstrates the efficacy on communication primitives, application kernels such as LU decomposition and full applications such as Smooth Particle Hydrodynamics (SPH) application.
Revised: July 30, 2013 |
Published: December 26, 2012
Citation
Vishnu A., J.A. Daily, and B.J. Palmer. 2012.Designing Scalable PGAS Communication Subsystems on Cray Gemini Interconnect. In 19th International Conference on High Performance Computing (HiPC), December 18-22, 2012, Pune, India. Piscataway, New Jersey:Institute of Electrical and Electronics Engineers.PNNL-SA-89704.doi:10.1109/HiPC.2012.6507506