The official tagline for this year’s annual high-performance computing (HPC) conference, SC19, was “HPC is now,” but the message delivered by PNNL computational scientists at this year’s conference hinted that “machine learning is next.” An award-winning student poster and featured invited presentation on scientific machine learning (ML) highlighted this year’s conference presence.
At a conference featuring the most advanced computing hardware and software, ML in its various guises was on full display and highlighted by Nathan Baker’s featured invited presentation.
Baker took center stage on Tuesday, November 20, to map out the U.S. Department of Energy (DOE) Office of Science’s six priority research directions for scientific machine learning. Before an audience of hundreds, Baker explained that to make ML useful for science, it will be important to encode human knowledge—in fields such as medicine and physics—so that it can be applied to the complex problems best solved by HPC. Today’s ML and artificial intelligence (AI) programming can’t easily process decades of accumulated scientific data and graphs.
“Scientists need to know what’s going on in the fancy black box,” said Baker. “The question is: How can proofs or guarantees be created with machine learning. We need to develop guarantees that the work produced can be trusted.”
He then gave examples—from PNNL and elsewhere—of researchers starting to address the scientific ML accountability gap.
Modeling groundwater flow at the Hanford Site
One example Baker highlighted involves a complex simulation of groundwater flow in a real-world scenario. A research collaboration among scientists at PNNL, Brown University, Massachusetts Institute of Technology, and Lawrence Berkeley National Laboratory (LBNL) is tackling the thorny issue of understanding subsurface water flow at the Hanford Site. They are using computational tools that incorporate knowledge from physics with ML techniques. Their method encodes known physical properties of subsurface water flow and then applies that knowledge to data collected on the environmental conditions at the Hanford Site. Their results, presented at SC19, demonstrate the promise of physics-informed generative adversarial networks (GANs) for analyzing complex, large-scale science problems.
The model of groundwater flow at the Hanford Site set a computing record on Oak Ridge National Laboratory’s Summit supercomputer, achieving 93.1 percent scaling efficiency. The research project is funded by the Collaboratory for Mathematics and Physics-Informed Learning Machines for Multiscale and Multi-physics (PhILMS) program, which is co-led by Alex Tartakovsky, a computational mathematician and PNNL collaborator on the project.
“This exemplar project brought together experts from subsurface modeling, applied mathematics, deep learning, and HPC,” said LBNL’s Prabhat, a project collaborator, in an interview about the research. “As the DOE considers broader applications of deep learning – and, in particular, GANs – to simulation problems, I expect multiple research teams to be inspired by these results.”
Applying AI to improve medical diagnoses
Another team, led by PNNL computer scientists Khushbu Agarwal and Sutanay Choudhury, is
teaching computers to interpret different types of health care data, including electronic health records and doctors’ clinical notes. The team, in collaboration with researchers at Stanford University, developed a new approach to incorporate over 300,000 medical concepts and definitions used by doctors into AI models as part of the PNNL-funded project Deep Care. Baker presented data showing that the team’s algorithm improved the prediction of patient diagnosis by 20 percent relative to other state-of-the-art clinical concept embeddings. The goal of Deep Care is to create an AI-based decision-support tool for doctors.
Computing security in the spotlight
PNNL computer scientists Ang Li and Kevin Barker are using an ML framework to ferret out clandestine use of HPC systems. As HPC systems become more powerful, they are increasingly exploited by attackers to run malicious software. A PNNL-based team, which also included Pengfei Zou, a graduate student at Clemson University, and Rong Ge, his advisor, presented a research
poster at SC19 that showed a new workload classification framework can discern between authorized and illicit workloads with 95 percent accuracy. Li also discussed the team’s findings as a featured speaker at the DOE Office of Science booth in the exhibit hall. The research poster received third place among dozens of entries in the student research poster session. The research was made possible through the Center for Advanced Technology Evaluation (CENATE), a computing proving ground supported by the DOE’s Office of Advanced Scientific Computing Research.
Packed house for quantum computing
Interest in the future of quantum computing drew a capacity crowd, with many waiting just outside the doors for a chance to hear from national laboratory leaders, including Nathan Wiebe of PNNL. After opening remarks from Travis Humble, from Oak Ridge National Laboratory’s Quantum
Computing Institute, Wiebe outlined opportunities for the national laboratories to make significant contributions to the development of what many observers expect to be a major thrust for the computing industry globally over the coming decades. Wiebe stressed the need for new materials that make quantum processing more stable and reliable. The national laboratories have a lot to contribute to materials, algorithm development for quantum systems, and adapting existing algorithms to quantum computing frameworks.
The influence game
Anyone versed in digital communications understands that some people have way more influence than others on social behavior and trends. Tracking influencers online has become an industry unto itself. Now, PNNL computer scientists, in collaboration with Washington State University, NVIDIA, and IBM, have developed a method called cuRipples that uses influencer principles to converge on solutions to huge large-scale computational problems up to 790 times faster than current methods.
In an invited talk at the NVIDIA industry booth in the SC19 exhibition hall, Marco Minutoli, a PNNL computer scientist, explained that cuRipples combines graph analytics with an understanding of which data are the most important “influencers.” In under a minute of compute time on a supercomputer, cuRipples can converge on a solution to a complex problem, such as how to contain the impact of airport traffic delays. A similar problem run on a supercomputer without cuRipples typically takes an entire day to converge on a similar solution. While the method is now in the testing phase, Minutoli says, “if we could converge on a reliable solution in under a minute, we could begin to use this as an interactive tool. And then we can ask and answer new questions in ways that begin to be practical for solving complex problems like analyzing transportation networks in real time.”