March 6, 2023
Article

Understanding and Predicting Extreme Rainfall Events through Machine Learning

Using machine learning to understand and predict extreme precipitation

Graphic showing flooded street with cars, overlaid with data representations and computer stacks.

Extreme rainfall events that can lead to dangerous flooding have increased in the United States.

(Image by Shannon Colson | Pacific Northwest National Laboratory)

Extreme rainfall can cause severe flooding that is hazardous to human life and property. In recent decades, the United States has seen a broad increase in extreme precipitation events. Understanding the environmental factors that influence how often these events happen and how severe they are can help planners better anticipate extremes while designing future infrastructure or planning infrastructure maintenance.

In new work published in the Journal of Advances in Modeling Earth Systems, researchers at Pacific Northwest National Laboratory (PNNL) used different machine learning (ML) methods to predict the strength and occurrences of extreme precipitation events across the United States. The researchers divided the country into six sectionsWest Coast, Mountain, North Great Plain, South Great Plain, Northeast, and Southeastto check for regional differences.

The team built ML models to examine how often extreme precipitation events will occur, known as their return period. For example, an amount of rainfall that only happens once every 50 years has a 50-year return period. However, as extreme events become more frequent, their return periods are getting shorter.

“Developing those machine learning models required close collaboration between domain scientists and data scientists,” said Jiwen Fan, PNNL Earth scientist and Lab fellow. “We had to check if the results generated by machine learning models made sense. Were the representations physically possible?”

Understanding data to develop better models

Part of creating logical ML models required identifying the right factors to study. For complex processes like precipitation, there are an overwhelming number of possibilities that could go into a model. In this work, the researchers had around 300 potential factors for their model.

“Selecting what factors to model is a constant challenge in our work,” said Jason Hou, a PNNL data scientist and coauthor. “Identifying relevant data is an area where collaboration with domain scientists has a major impact.”

The team narrowed down the potential factors to around 40, which they then fed into the ML model. The model identified the ten most important factors for deeper study.

During the process, the team worked to make sure the data complied with FAIR (findable, accessible, interoperable, and reusable) data standards. These international standards were designed to allow multiple research efforts to use the same dataset. “FAIR data is key to enabling reproducible machine learning,” said Hou.

Representing extreme precipitation

The newly developed ML models accurately represented current monthly extreme precipitation events for most regions, with slightly larger relative errors in the Mountain region. The models provided more than just general predictions. They also output relationships between various extreme precipitation characteristics and environmental properties.

From this data, the researchers identified surface latent heat flux, the water-driven exchange of energy between the surface and atmosphere, soil moisture, and relative humidity as three key factors that influence extreme precipitation events across all regions. These are consistent with how scientists currently understand precipitation, validating the model’s physical relevance.

“These ML models really represent what we know about extreme precipitation events well,” said Fan.

The researchers plan to use the models to project future extreme precipitation events under a changing climate. One challenge is obtaining data with the proper resolution and dimension from climate models to feed into the ML models. However, the team remains on the lookout for suitable datasets.