Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors. This paper discusses the requirements, functionality and development of multilevel-parallelism based on processor groups in the context of the Global Array (GA) shared memory programming model. The main effort involves management of shared data, rather than interprocessor communication. Experimental results for the NAS NPB Conjugate Gradient benchmark and a molecular dynamics (MD) application are presented for a Linux cluster with Myrinet and illustrate the value of the proposed approach for improving scalability. While the original GA version of the CG benchmark lagged MPI, the processor-group version outperforms MPI in all cases, except for a few points on the smallest problem size. Similarly, the group version of the MD application improves execution time by 58% on 32 processors.
Revised: May 19, 2011 |
Published: May 4, 2005
Citation
Nieplocha J., M. Krishnan, B.J. Palmer, V. Tipparaju, and Y. Zhang. 2005.Exploiting Processor Groups to Extend Scalability of the GA Shared Memory Programming Model. In Proceedings of the 2nd Conference on Computing Frontiers, May 4-6, Ischia, Italy, 262 - 272. New York, New York:Association for Computing Machinery.PNNL-SA-44286.doi:10.1145/1062261.1062305