June 18, 2025
Journal Article
Machine Learned Empirical Numerical Integrator from Simulated Data
Abstract
Recently a number of state-of-the-art surrogate machine learning (ML) models have been designed for global weather and climate prediction, which have been trained using reanalysis data products. Reanalysis data products are constructed using numerical model simulations that combine partial differential equations and parameterization schemes. They are typically only made available using coarsened spatial and temporal resolutions. This study explores the impact of the numerical generation methods used to produce the training datasets and the temporal resolution of those datasets on machine learning surrogate models. Using the nonlinear vector autoregression (NVAR) machine as an explainable ML technique, simple dynamical system models with three classical numerical integration schemes are emulated. The NVAR machine produces exceedingly skillful predictions for longer than 20 Lyapunov times by accurately reconstructing not only the dynamics but also the numerical integration scheme that produced the training data. However, the machine fails to make predictions on unseen test data generated by a different numerical integration scheme, despite the fact that the underlying dynamical system is the same - causing a word of caution for the growing field of machine learning emulation of weather and climate dynamics. Furthermore, we illustrate using NVAR machine that training on temporally coarsened data could increase the required complexity of ML models and potentially introduce numerical challenges. An additional outcome is that higher-order empirical integration schemes, some producing improved accuracy compared to classical integration schemes, can be constructed directly from the data.Published: June 18, 2025