December 6, 2019
Feature

PNNL Researchers Make Splash with Machine Learning at SC19

At a conference featuring the most advanced computing hardware and software, ML in its various guises was on full display and highlighted by Nathan Baker’s featured invited presentation.

Nathan Baker talks machine learning at SC19

Nathan Baker talks machine learning at SC19

The official tagline for this year’s annual high-performance computing (HPC) conference, SC19, was “HPC is now,” but the message delivered by PNNL computational scientists at this year’s conference hinted that “machine learning is next.” An award-winning student poster and featured invited presentation on scientific machine learning (ML) highlighted this year’s conference presence.

At a conference featuring the most advanced computing hardware and software, ML in its various guises was on full display and highlighted by Nathan Baker’s featured invited presentation.

Baker took center stage on Tuesday, November 20, to map out the U.S. Department of Energy (DOE) Office of Science’s six priority research directions for scientific machine learning. Before an audience of hundreds, Baker explained that to make ML useful for science, it will be important to encode human knowledge—in fields such as medicine and physics—so that it can be applied to the complex problems best solved by HPC. Today’s ML and artificial intelligence (AI) programming can’t easily process decades of accumulated scientific data and graphs.

“Scientists need to know what’s going on in the fancy black box,” said Baker. “The question is: How can proofs or guarantees be created with machine learning. We need to develop guarantees that the work produced can be trusted.”

He then gave examples—from PNNL and elsewhere—of researchers starting to address the scientific ML accountability gap.

Modeling groundwater flow at the Hanford Site

One example Baker highlighted involves a complex simulation of groundwater flow in a real-world scenario. A research collaboration among scientists at PNNL, Brown University, Massachusetts Institute of Technology, and Lawrence Berkeley National Laboratory (LBNL) is tackling the thorny issue of understanding subsurface water flow at the Hanford Site. They are using computational tools that incorporate knowledge from physics with ML techniques. Their method encodes known physical properties of subsurface water flow and then applies that knowledge to data collected on the environmental conditions at the Hanford Site. Their results, presented at SC19, demonstrate the promise of physics-informed generative adversarial networks (GANs) for analyzing complex, large-scale science problems.

Hanford Site mapping
Hanford Site hydrology modeling site with locations of hydrology sensors for levels 1 (black) and 2 (color). Units are in km.

The model of groundwater flow at the Hanford Site set a computing record on Oak Ridge National Laboratory’s Summit supercomputer, achieving 93.1 percent scaling efficiency. The research project is funded by the Collaboratory for Mathematics and Physics-Informed Learning Machines for Multiscale and Multi-physics (PhILMS) program, which is co-led by Alex Tartakovsky, a computational mathematician and PNNL collaborator on the project.

“This exemplar project brought together experts from subsurface modeling, applied mathematics, deep learning, and HPC,” said LBNL’s Prabhat, a project collaborator, in an interview about the research. “As the DOE considers broader applications of deep learning – and, in particular, GANs – to simulation problems, I expect multiple research teams to be inspired by these results.”

Applying AI to improve medical diagnoses

Another team, led by PNNL computer scientists Khushbu Agarwal and Sutanay Choudhury, is

AI med

teaching computers to interpret different types of health care data, including electronic health records and doctors’ clinical notes. The team, in collaboration with researchers at Stanford University, developed a new approach to incorporate over 300,000 medical concepts and definitions used by doctors into AI models as part of the PNNL-funded project Deep Care. Baker presented data showing that the team’s algorithm improved the prediction of patient diagnosis by 20 percent relative to other state-of-the-art clinical concept embeddings. The goal of Deep Care is to create an AI-based decision-support tool for doctors.

The influence game

Anyone versed in digital communications understands that some people have way more influence than others on social behavior and trends. Tracking influencers online has become an industry unto itself. Now, PNNL computer scientists, in collaboration with Washington State University, NVIDIA, and IBM, have developed a method called cuRipples that uses influencer principles to converge on solutions to huge large-scale computational problems up to 790 times faster than current methods.

NVIDIA GPU processor
NVIDIA GPU processor

In an invited talk at the NVIDIA industry booth in the SC19 exhibition hall, Marco Minutoli, a PNNL computer scientist, explained that cuRipples combines graph analytics with an understanding of which data are the most important “influencers.” In under a minute of compute time on a supercomputer, cuRipples can converge on a solution to a complex problem, such as how to contain the impact of airport traffic delays. A similar problem run on a supercomputer without cuRipples typically takes an entire day to converge on a similar solution. While the method is now in the testing phase, Minutoli says, “if we could converge on a reliable solution in under a minute, we could begin to use this as an interactive tool. And then we can ask and answer new questions in ways that begin to be practical for solving complex problems like analyzing transportation networks in real time.”

###

About PNNL

Pacific Northwest National Laboratory draws on its distinguishing strengths in chemistry, Earth sciences, biology and data science to advance scientific knowledge and address challenges in sustainable energy and national security. Founded in 1965, PNNL is operated by Battelle for the Department of Energy’s Office of Science, which is the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science. For more information on PNNL, visit PNNL's News Center. Follow us on Twitter, Facebook, LinkedIn and Instagram.