May 20, 2021
Conference Paper

Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples

Abstract

The CCSD(T) coupled-cluster model with perturbative triples is considered a gold standard for computational modeling of the correlated behavior of electrons in molecular systems. A fundamental constraint is the relatively small global-memory capacity in GPUs compared to the main-memory capacity on host nodes, necessitating relatively smaller tile sizes for high-dimensional tensor contractions in NWChem's GPU-accelerated implementation of the CCSD(T) method. A coordinated redesign is described to address this limitation and associated data movement overheads, including a novel fused GPU kernel for a set of tensor contractions, along with inter-node communication optimization and data caching. The new implementation of GPU-accelerated CCSD(T) improves overall performance by 3.4x. Finally, we discuss the trade-offs in using this fused algorithm on current and future supercomputing platforms.

Published: May 20, 2021

Citation

Kim J., A.R. Panyala, B. Peng, K. Kowalski, P. Sadayappan, and S. Krishnamoorthy. 2020. Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC2020), November 9-19, 2020, Atlanta, GA, 1-15. Piscataway, New Jersey:IEEE. PNNL-SA-154438. doi:10.1109/SC41405.2020.00083