October 13, 2020

Opening the Black Box of Neural Networks

PNNL researchers peer into water clusters database, train network to predict energy landscapes

water molecule

PNNL researchers worked with the molecular, gaseous form of water, finding information about hydrogen bonds and structural patterns.

Image: Egorov Artem | Shutterstock

Machine learning algorithms, the basis of neural networks, are opening doors to new discoveriesor at least offering tantalizing cluesone massive database at a time. Case in point: Pacific Northwest National Laboratory (PNNL) researchers delved deeply into modeling the interactions among water molecules, finding information about hydrogen bonds and structural patterns while plowing a path usingyou guessed itdeep learning.

“Neural networks are a way for the computer to automatically learn different properties of systems or data,” said PNNL data scientist, Jenna Pope. “In this case, the neural network learns the energy of different water cluster networks based on previous data.”

PNNL researchers used 500,000 water clusters from a recently developed database of over 5 million water cluster minima to train a neural network relying on the mathematical power of graph theorya collection of nodes and links representing molecular structureto decipher structural patterns of the aggregation of water molecules. Working with the molecular, gaseous form of water, they paid particular attention to the relation between hydrogen bonding and energy relative to the most stable structure.

“That’s the holy grail,” said Pope. “Right now, it takes a lot of effort to develop an accurate interaction potential for water. But with neural networks, the eventual goal is to have the networks learn, from a large data set, what is the energy of a network based on its underlying molecular structure.”

After sizing up 500,000 water clusters, the neural network in the PNNL-led study was trained in the various ways water molecules interact with each other. The data set theoretically could have included the whole database of 5 million water networks. But for practical reasons it didn’t.

“Using the whole database to train the neural network would have taken too much computational time,” said Pope. Training the deep neural networks to examine those 500,000 water clustersjust one-tenth of the full databasetook more than two and half days using four state-of-the-art computers with sizable graphics processing unit (GPU) performance, working around the clock.

water molecule illustration
PNNL researchers used 500,000 water clusters from a recently developed database of over 5 million water cluster minima to train a neural network relying on the mathematical power of graph theorya collection of nodes and links representing molecular structureto decipher structural patterns of the aggregation of water molecules. Graphic: Nathan Johnson | PNNL

Part of a bigger picture

Neural networks have been around for decades. Greater processing power of GPU chips over the past 10 years, however, has sharply advanced the capability of these networks, also associated with the phrase “deep learning.” Even with such promise, training neural networks is an expensive computational challenge. And as accurate as they might be, neural networks are often criticized as black boxes that offer no information about why they are giving the answer they do.

The U.S. Department of Energy’s (DOE’s) Exascale Computing Project (ECP) was launched in 2016 to explore the most intractable supercomputing problems, including the refinement of neural networks. In 2018, ECP spawned the ExaLearn Co-Design Center, focusing on machine learning technologies. PNNL is among eight national laboratories taking part in the ExaLearn project. James Ang, PNNL’s chief scientist for computing in Physical and Computational Sciences, leads the Laboratory’s participation.

Database close to home

One of ExaLearn’s major goals is to develop artificial intelligence technologies that can design new chemical structures by learning from massive data sets. Research led by Sutanay Choudhury, a PNNL computer scientist, tapped into the massive water clusters database developed at the PNNL-Richland campus by Sotiris Xantheas, a PNNL Laboratory fellow. Xantheas, known in chemical physics for his research in intermolecular interactions in aqueous ionic clusters, is a co-author on the neural networks study published in the special issue “Machine Learning Meets Chemical Physics” of the Journal of Chemical Physics.

“Several macroscopic properties of water have been attributed to its fleeting hydrogen bonding network, which consists of a dynamic network of bonds that break and reform in a fraction of a second at room temperature,” said Xantheas, whose database work was supported by DOE’s Office of Science, Basic Energy Sciences program, Chemical Sciences, Geosciences, and Biosciences Division. “Water clusters provide a testbed for probing this fleeting hydrogen bonding network by understanding the structureenergy relation of the different hydrogen bonding arrangements.”

PNNL’s researchers had a strategy to decipher this particular black box. They used graph theorya branch of mathematics that studies how things are connected in a networkto represent, in graphic form, molecules and their polygon substructures. The graph-theoretical descriptors the team devised provided several insights into the water clusters’ makeup.

“What we have done,” said Pope, “is provide additional analysis after the network is trained. The analysis quantifies different structural properties of these water cluster networks. Then we can compare them to our predicted neural network and, in subsequent data set examinations, improve the network’s understanding.”

Water has a neural network future

The study’s findings provide a foundation for analysis of water clusters’ structural patterns in more complex hydrogen-bonded networks, such as liquid water and ice.

“If you were able to train a neural network,” said Pope, “that neural network would be able to do computational chemistry on larger systems. And then you could make similar insights in computational chemistry about chemical structure or hydrogen bonding or the molecules’ response to temperature changes. Those are among the goals of this research.”

In addition to Choudhury, Pope, and Xantheas, the study’s co-authors include Joseph P. Heindel, Malachi Schram, and Pradipta Bandyopadhyay.

This research is based on work supported by the DOE Office of Science in part through DOE’s ECP ExaLearn Co-Design Center. PNNL is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830.


About PNNL

Pacific Northwest National Laboratory draws on its distinguishing strengths in chemistry, Earth sciences, biology and data science to advance scientific knowledge and address challenges in sustainable energy and national security. Founded in 1965, PNNL is operated by Battelle for the Department of Energy’s Office of Science, which is the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://www.energy.gov/science/. For more information on PNNL, visit PNNL's News Center. Follow us on Twitter, Facebook, LinkedIn and Instagram.