Technical note: Using Long Short-term Memory Models to Fill Data Gaps in Hydrological Monitoring Networks

April 8, 2022

Journal Article

Technical note: Using Long Short-term Memory Models to Fill Data Gaps in Hydrological Monitoring Networks

Abstract

The spatio-temporal dynamics in subsurface hydrological flows over a long time window are usually quantified through a network of monitoring wells; however, such observations often are spatially sparse and temporal gaps exist due to poor quality or instrument failure. In this study, we explore the ability of recurrent neural networks to fill gaps in a spatially distributed time-series dataset from a well network that monitors the dynamic and heterogeneous hydrologic exchanges between the Columbia River and its adjacent groundwater aquifer at the U.S. Department of Energy’s Hanford site. This 10-year-long dataset contains hourly temperature, specific conductance, and groundwater table elevation measurements from 42 wells with various lengths of gaps. We employ a long short-term memory (LSTM) model to capture the temporal variations in the observed system behaviors for gap filling. The performance of the LSTM-based gap filling method was evaluated against a traditional autoregressive integrated moving average (ARIMA) method in terms of both the error statistics and how well they capture the temporal patterns in river corridor wells that exhibit various dynamics signatures. Our study demonstrates that the ARIMA models yield better average error statistics, yet they tend to have larger errors during time windows with abrupt changes or high-frequency (daily and subdaily) variations. The LSTM-based models are found to excel in capturing both the high-frequency and low-frequency (monthly and seasonal) dynamics, although the inclusion of high-frequency fluctuations may also lead to overly dynamic predictions in time windows that lacks such fluctuations. The LSTM is able to take advantage of the spatial information from neighboring wells to improve the gap filling accuracy, especially for long gaps in system states that vary at subdaily scales. Despite the fact that LSTM models require substantial training data and have limited extrapolation power beyond the conditions represented in the training data, they afford the great flexibity to account for the spatial and temporal correlations and nonlinearity in data without a priori assumptions. Thus, LSTMs provide effective alternatives to fill in data gaps in spatially distributed time-series observations characterized by multiple dominant frequencies of variability, which are essential for advancing our understanding of dynamic complex systems.

Published: April 8, 2022

Citation

Ren H., E. Cromwell, B.S. Kravitz, and X. Chen. 2022. Technical note: Using Long Short-term Memory Models to Fill Data Gaps in Hydrological Monitoring Networks. Hydrology and Earth System Sciences 26, no. 7:1727–1743. PNNL-SA-161724. doi:10.5194/hess-26-1727-2022