Technical Session 2
Artificial Intelligence/Machine Learning-Empowered Digitization of Environmental Systems
Tuesday, November 14, 2023 | 1:00 - 5:00 p.m. Pacific Time
► Watch the recording:

Artificial intelligence (AI) and machine learning (ML) are powerful tools for capitalizing on the broad range and large quantity of environmental data to provide insights into complex environmental processes and enable more effective environmental management. Presentations in this session will cover research contributions that encompass a broad range of topics related to application of AI, ML, and big data analytics to environmental monitoring, modeling, and remediation. These include, but are not limited to, development and application of AI/ML or digital twin (DT) models to facilitate understanding and design of multiscale environmental management and remediation systems; physics-informed ML and ML-guided numerical modeling to increase the effectiveness and reduce the effort related to the design of remediation and monitoring systems; reduction of system complexity or identification of influential drivers of environmental system behaviors; and exploratory data analysis, pattern recognition, and signature discovery to provide a better understanding of the system dynamics and spatial heterogeneity.
Session Organizers: Z. Jason Hou, Pacific Northwest National Laboratory; Haruko M. Wainwright, Massachusetts Institute of Technology; and Reed Maxwell, Princeton University
1:00 - 1:05 p.m. |
Opening Remarks __________________________________________________ |
1:05 - 1:25 p.m. Survey Unit Selection for Sample Representativeness in Site Contamination Studies Narmadha Mohankumar, Pacific Northwest National Laboratory ► PRESENTATION PDF
|
The investigation of potential contamination at a site or facility involves the collection of data samples from the region, with the aim of determining whether the average contamination level surpasses a predefined threshold. However, the process of sampling must address important considerations, such as determining the optimal number of samples to collect and their appropriate locations in order to obtain a representative sample of the entire region. A crucial aspect that needs to be understood in this context is the influence of spatial autocorrelation. Spatial autocorrelation refers to the extent to which georeferenced data points exhibit similarity to each other, which can be influenced by various factors like the type of contamination, the initial dispersal in the region, and the pathways of migration. As spatial autocorrelation increases, the presence of duplicate information within the sampled data may also increase which can negatively impact conclusions from traditional hypothesis tests commonly used in site contamination studies. Moreover, the collection of duplicate information due to spatial autocorrelation may result in unnecessary sampling effort. We discuss the interplay of spatial autocorrelation and sample representativeness and how unidentified spatial autocorrelation can lead to misleading conclusions when employing hypothesis tests. We present our research on determining the effective sample size and identifying the optimal placement of samples using a simulation-based methodology. Furthermore, we address uncertainty in metrics such as standard error, upper tolerance limit (UTL), and upper specification limit (USL) to address the presence of spatial autocorrelation for spatially correlated data analysis for site contamination studies. By incorporating these advancements, we aim to obtain robust inferences and predictions regarding site contamination, enabling derivation of reliable and accurate conclusions for better decision-making in site contamination studies. Coauthors: Jen Huckett (Pacific Northwest National Laboratory), Deb Fagan (Pacific Northwest National Laboratory), Moses Obiri (Pacific Northwest National Laboratory) |
1:25 - 1:45 p.m. Data Analytics for Climate Change Impact Satyarth Praveen, Lawrence Berkeley National Laboratory ► PRESENTATION PDF |
Climate resilience is defined as the capacity of an individual site to perform according to its regulatory requirements while impacted by potential stresses imposed by climate variability, weather extremes, and related impacts projected by future climate scenarios. As highlighted in the US Global Change Research Program's recent National Climate Assessment, the impacts of climate change are broadly distributed across the United States, with regionally-specific effects potentially threatening sites and site infrastructure under remediation or with waste disposal cells. Therefore, climate resilience strategies must be developed by evaluating and assessing the long-term performance of these sites under various climatic conditions. We developed the climate-resilience package (https://pypi.org/project/climate-resilience/) to aid the long-term climate resilience and vulnerability assessment for soil and groundwater contaminated sites. It significantly simplifies the process of downloading, preprocessing, and visualizing the spatial and temporal information regarding different sites across the US. Formal documentation supplements the package with an easy-to-use resource with sample scripts and notebooks to demonstrate its usage. The package used Google Earth Engine to download the CMIP5 climate model dataset, with the options of different models, scenarios, and variables across sites. In addition to the time-series data, the toolkit supports the computation of climate metrics, for example, the number of extreme precipitation days and extreme degree days. It is also integrated with the external datasets, such as the Standardized Precipitation Evapotranspiration Index (SPEI) , to provide effective visualization and integrated insights for long-term drought analysis. We also developed the bias-correction pipeline to use meteorological measurements to improve climate projection. This toolkit has been used for developing the Climate Adaptation and Resilience Plan (CARP) and the Vulnerability Assessment and resilience planning guidance (VARP) for the 118 DOE Office of Legacy Management and Environmental Management sites. This open-source package is intended to be used by researchers and the general audience. Coauthors: Zexuan Xu (Lawrence Berkeley National Laboratory), Haruko M. Wainwright (Massachusetts Institute of Technology), Varsha Madapoosi (University of California Berkeley) |
1:45 - 2:05 p.m. Physics-Informed Surrogate Modeling for Supporting Climate Resilience at Groundwater Contamination Sites Lijing Wang, Lawrence Berkeley National Laboratory ► PRESENTATION PDF |
Contamination of soil and groundwater presents a widespread global problem, significantly impacting both human well-being and environmental stability. Conventional models employed for estimating pollutant concentrations under varying climatic conditions demand extensive computational power and high-performance computing resources. In response to this issue, we have devised an innovative method utilizing a physics-informed machine learning technique, known as the U-Net Enhanced Fourier Neural Operator (U-FNO), to generate rapid surrogate models for flow and transport. These models are capable of forecasting groundwater pollution levels under diverse climatic situations and subsurface characteristics without necessitating a supercomputer. In our research, we centered our attention on the Department of Energy's Savannah River Site (SRS) F-Area and established two time-dependent structures: U-FNO-3D and U-FNO-2D. Both frameworks incorporated a tailored loss function comprising data-driven factors and physical boundary limitations. The findings of our study indicate that the FNO and U-FNO models can consistently foresee spatial-temporal fluctuations in groundwater flow and pollutant transportation properties, such as contaminant concentration, hydraulic head, and Darcy's velocity. Our research reveals that the U-FNO-2D architecture is especially adept at predicting the effects of alterations in recharge rates on groundwater contamination sites, delivering superior time-dependent forecasts compared to the U-FNO-3D structure. Our novel approach holds the potential to revolutionize environmental monitoring and remediation efforts by providing rapid, precise, and cost-efficient estimations of groundwater pollution levels under uncertain climate conditions. Coauthors: Satyarth Praveen (Lawrence Berkeley National Laboratory), Zexuan Xu (Lawrence Berkeley National Laboratory), Haruko Wainright (Massachusetts Institute of Technology) |
2:05 - 2:25 p.m. Long Term Ground Water Monitoring Using LSTM Algorithm for Anomaly Detection Jayesh Soni, Florida International University ► PRESENTATION PDF |
Advances in low-cost in-situ sensors present a significant opportunity for improving long-term groundwater monitoring, such as automated anomaly detection. However, effectively utilizing the data from these sensors poses a significant challenge due to their fluctuations, noise and others. To address this, Artificial Intelligence/Machine Learning (AI/ML) models are being developed to predict contaminant dynamics and to perform anomaly detection. This paper focuses on the development of an AI/ML-based anomaly detection model, leveraging the capabilities of Long Short-Term Memory (LSTM) models. LSTM, a type of recurrent neural network, excels in handling sequential data and capturing temporal dependencies. The proposed LSTM-based approach enables the detection of anomalies in sensor readings, allowing for the early identification of abnormal contamination levels in various analytes. By training the LSTM model on historical data of normal sensor readings, it learns the patterns and regularities inherent in the groundwater contamination data. Subsequently, the model evaluates real-time sensor measurements against these learned patterns, classifying deviations as anomalies and indicating potential issues. The strength of LSTM lies in its ability to capture long-term dependencies and extract meaningful features from the sequential nature of the data. By considering the historical context and relationships between previous sensor readings, the LSTM model can effectively differentiate between normal variations and true anomalies, enabling proactive intervention and mitigation measures. Successful implementation of the proposed LSTM-based anomaly detection model will provide the groundwater contaminated sites with a reliable tool for continuous monitoring of groundwater contamination. The timely detection of anomalies will facilitate prompt response and remedial actions, minimizing the potential spread of contamination and mitigating risks to the environment and public health. We demonstrate the algorithms based on the datasets from the Savannah River Site F-Area. Coauthors: Himanshu Upadhyay (Florida International University), Haruko Wainwright (Massachusetts Institute of Technology), Zexuan Xu, (Lawrence Berkeley National Laboratory), Leonel Lagos (Florida International University) |
2:25 - 2:45 p.m. Leveraging a Data-Driven Approach for Optimizing Pump-and-Treat Well Network Operations Xuehang Song, Pacific Northwest National Laboratory ► PRESENTATION PDF |
Pump-and-treat (P&T) is a common groundwater remediation technique that involves the extraction of contaminated water and its subsequent treatment above ground. Ensuring the consistent effectiveness of contaminant removal throughout the operational lifetime of the remedy, which typically spans decades, is critical for a successful remedial strategy. This requires dynamic management of the extraction well network. Coauthors: Mark Rockhold (Pacific Northwest National Laboratory), Bryan He (Pacific Northwest National Laboratory), Marinko Karanovic (S.S. Papadopulos and Associates Inc.), Matt Tonkin (S.S. Papadopulos and Associates Inc.), Inci Demirkanli (Pacific Northwest National Laboratory), Rob Mackley (Pacific Northwest National Laboratory) |
2:45 - 3:15 p.m. | Posters and Vendor Exhibit __________________________________________________ |
3:15 - 3:35 p.m. Efficient Super-Resolution of Near-Surface Climate Modeling Using the Fourier Neural Operator Peishi Jiang, Pacific Northwest National Laboratory ► PRESENTATION PDF |
Downscaling methods are critical in efficiently generating high-resolution atmospheric data. However, state-of-the-art statistical or dynamical downscaling techniques either suffer from the high computational cost of running a physical model or require high-resolution data to develop a downscaling tool. Here, we demonstrate a recently proposed zero-shot super-resolution method, the Fourier neural operator (FNO), to efficiently perform downscaling without the need for high-resolution data. Because the FNO learns dynamics in Fourier space, FNO is a resolution-invariant emulator; it can be trained at a coarse resolution and produces emulation at any high resolution. We applied FNO to downscale a 4-km resolution Weather Research and Forecasting (WRF) Model simulation of near-surface heat-related variables over the Great Lakes region. The FNO is driven by the atmospheric forcings and topographic features used in the WRF model at the same resolution. We incorporated a physics-constrained loss in FNO by using the Clausius-Clapeyron relation to better constrain the relations among the emulated states. Trained on merely 600 WRF snapshots at 4-km resolution, the FNO shows comparable performance with a widely-used convolutional network, U-Net, achieving averaged $mKGE$ of 0.88 and 0.94 on the test dataset for temperature and pressure, respectively. We then employed the FNO to produce 1-km emulations to reproduce the fine climate features. Further, by taking the WRF simulation as ground truth, we show consistent performances at the two resolutions, suggesting the reliability of FNO in producing high-resolution dynamics. Our study demonstrates the potential of using FNO for zero-shot super-resolution in generating first-order estimation on atmospheric modeling. Coauthors: Peishi Jiang (Pacific Northwest National Laboratory), Zhao Yang (Pacific Northwest National Laboratory),Jiali Wang (Argonne National Laboratory), Chenfu Huang (Michigan Technological University), Pengfei Xue (Michigan Technological University), TC Chakraborty (Pacific Northwest National Laboratory), Xingyuan Chen (Pacific Northwest National Laboratory), Yun Qian (Pacific Northwest National Laboratory) |
3:35 - 3:55 p.m. Machine Learning Analysis of Western US Fire Impacts on Hailstorms in the Central US Xinming Lin, Pacific Northwest National Laboratory ► PRESENTATION PDF |
Fires, particularly wildfires, can severely deteriorate air quality and disrupt various services, including to transportation, communications, and utilities such as power, gas, and water supply. By impacting atmospheric environments such as temperature and aerosol, they may result in or impact severe convective storms. This study investigates the remote impacts of fires in western United States (WUS) on the occurrence of large hail (size >=1 inch) in central US (CUS) over the 20-year period (2001- 2020) using machine learning (ML) methods. We develop random forest (RF) and eXtreme Gradient Boosting (XGB) classification models to explore the linkage between the fire features in the WUS and the occurrence of large hail in the CUS, and then identify the important variables contributing to the occurrence of large hail. Both RF and XGB models can predict t he occurrence of large hail, especially in states such as South Dakota (SD) and Nebraska (NE), with model accuracy greater than 90% and F1-score up to 0.78. The variable rankings from both models show that temperature and relative humidity in the fire region and westerly winds, which are related to the transport of moisture and aerosols, are the most influential variables. These results obtained from the analysis of the long-term data are consistent with the findings from our previous single-case modeling study regarding the influence of WUS wildfires on the occurrence of CUS large hail. Coauthors: Jiwen Fan (Pacific Northwest National Laboratory), Z. Jason Hou (Pacific Northwest National Laboratory) |
3:55 - 4:15 p.m. Generative AI for Environmental Data Synthesis, Generation, and Augmentation Z. Jason Hou, Pacific Northwest National Laboratory ► PRESENTATION PDF |
Generative AI, with its ability to create new, high-resolution data from sparse or missing inputs, offers a transformative approach to environmental data synthesis, generation, and augmentation. The motivation behind applying these technologies lies in addressing persistent challenges faced by environmental sciences, including limited geographical coverage, temporal gaps in data collection, and the complex and multidimensional nature of environmental phenomena. This presentation discusses some recent progress in generative AI to interpolate and extrapolate from limited field characterization and monitoring data, benefiting regions with insufficient monitoring capabilities, enhancing the quality of environmental simulations and predictive models, and advancing our ability to design effective mitigation strategies. It will also discuss challenges associated with the accuracy of generated data, potential bias, and data privacy issues, as well as ethical guidelines and data validation techniques to ensure the reliability of our findings and their ethical implications. |
4:15 - 5:00 p.m. Challenges and Opportunities in AI/ML-Empowered Digitization of Environmental Systems
|
Group Discussion Panelists and conference participants are invited to engage in a reflection on the subjects outlined in the session. Facilitated by Z. Jason Hou, Pacific Northwest National Laboratory; Haruko M. Wainwright, Massachusetts Institute of Technology; and Reed Maxwell, Princeton University. __________________________________________________ |