# Computational Sciences & Mathematics Division

Research Highlights

April 2009

## New Algorithms Will Make Fast Work of Molecular Modeling

### PNNL has opportunity to define new standards for molecular simulations

In coming years, highly-scalable implementations of

*ab-initio*theories capable of utilizing the power of hundreds of thousands of CPUs will define new standards for molecular simulations. Enlarged View

**Results:** Researchers at Pacific Northwest National Laboratory have demonstrated the scalability of new, parallel algorithms being developed for high-performance computers. Researchers are developing a new generation of algorithms for molecular modeling, which will make effective use of emerging petascale and extreme-scale architectures. Key advances in molecular simulations are being pursued in several areas—coupled-cluster (CC) algorithms describing excited-state correlation effects, algorithms for parallel-in-time dynamical calculations, and multiscale approaches for strongly interacting systems. During the project's first phase, researchers demonstrated the scalability of high-level excited-state CC approaches and parallel-in-time algorithms across several thousands of CPUs using the EMSL HP Linux cluster with Infiniband Network.

**Why It Matters: **To effectively attack major scientific problems in energy, the environment, health, and national security, state-of-the-art algorithms need to be developed that are capable of running effectively on emerging petascale computer architectures and soon-to-be exascale machines. New advanced algorithms will make modeling and simulation feasible on these large, massively parallel computers.

Researchers at PNNL are targeting the algorithms that are capable of describing the behavior of molecules in excited states and novel algorithms for time upscaling of molecular dynamics theories. These methods will be used for calculating the properties of solid state and molecular systems, predicting structures, properties, and reactions for a wide variety of systems important to solar, hydrogen storage, catalytic, nuclear, and environmental remediation technologies.

PNNL researchers also are looking toward integrated multiscale approaches that can be used to model chemical processes in realistic settings and can capitalize on having highly scalable implementations of *ab-initio* methodologies, which will allow high-level description of large molecular systems in realistic settings defined by finite temperatures and pressures.

Progress in this field will significantly enhance the systems-size limit that can be managed by computer simulations and will set new standards for accuracies attainable in molecular simulations. In particular, researchers will use new, highly scalable codes to describe the energy conversion in light harvesting (photosynthesis) molecular systems and to simulate the structure, dynamics, and reactions at the mineral (Fe_{2}O_{3})/ solution interface as a function of pressure and temperature.

**Method: **PNNL's research team is using improved communication schemes,exploiting more efficient data localization patterns, and taking advantage of multi-level parallelism in designing new-generation algorithms on the extreme scale.

Completion of the first phase is significant because it clearly demonstrates that the proposed algorithms can scale across 10,000 CPUs. More extensive tests, using much larger number of CPUs, are planned this year.

Significant progress in scalability of a class of methodologies known as non-iterative CC methodologies for excited states was accomplished thanks to several factors, including redefined local memory management, a new global addressing strategy used to handle large data sets stored on Global Arrays, and more efficient ways of dealing with complex large-data expressions defining the CC equations. These improvements significantly increased the performance of the active-space version of non-iterative CC methods accounting for important, from the point of view of obtained accuracies, triply excited configurations.

Another important issue addressed during the project's first phase was to localize certain classes of large data sets used by the formalism. Similar progress, although achieved on a different basis, was made for parallel-in-time algorithms that required the use of multi-level parallelism and the development of highly efficient methods for the *ab-initio* molecular dynamics part of the algorithm. In particular, a new algorithm for hybrid density functional theory was implemented and an efficient task level parallelization scheme was developed using high-level programming language.

**What's Next: **Researchers continue to develop next-generation algorithms for massively parallel computers. During the second phase, the researchers are planning to address the following problems:

- Find an efficient solution to local memory bottlenecks characterizing correlation effects at different orders; further reduce the communication pattern by localization of higher-rank tensors in coupled-cluster approaches for excited states, and characterize the performance of resulting codes on computers with 50K-100K of CPUs.
- Development of the exascale parallel in time algorithms for use with terascale
*ab initio*molecular dynamics and the terascale molecular dynamics programs - Development of efficient interface between
*ab-initio*theories and an adaptive multiscale simulation module.

"At the completion of this project, we expect to have a suite of massively parallel tools to perform excited-state calculations for molecular systems composed of hundreds of atoms and new algorithms to perform dynamic simulations for much longer propagation times," says PNNL researcher Karol Kowalski. "Another important outcome is closely related to adaptive algorithms for inter-atomic potentials capable of incorporating the changes in chemical structure of the surrounding environment."

**Acknowledgments: **The Pacific Northwest National Laboratory strengthens U.S. scientific foundations for innovation by creating computational capabilities to solve problems using extreme-scale simulation and petascale data analytics.

**Sponsor: **The project is supported by the Pacific Northwest National Laboratory eXtreme Scale Computing Initiative.

**EMSL involvement:** Some of this work was conducted in the Environmental Molecular Sciences Laboratory, a Department of Energy national scientific user facility located at the Pacific Northwest National Laboratory.

**Research team:** Karol Kowalski; Eric Bylaska; Marat Valiev