August 24, 2020
Journal Article

General Protocol for the Accurate Prediction of Molecular 13C/1H NMR Chemical Shifts via Machine Learning Augmented DFT

Abstract

Accurate prediction of NMR chemical shifts with affordable computational cost is of great importance for various kinds of structural assignments in experimental studies. However, in many cases, the most popular computational method for NMR calculation, density functional theory (DFT) and gauge-including atomic orbital (GIAO), suffer from ambiguities in structural assignments. Herein, by using state-of-the-art machine learning (ML) techniques, a DFT+ML model has been developed that is capable of predicting the 13C/1H NMR chemical shifts of organic molecules with high accuracy. The input for the DFT+ML model contains two critical parts: one is a chemical environment descriptor vector, which can be evaluated without knowing the exact geometry of the molecule; the other one is the DFT calculated isotropic shielding constant. The DFT+ML model was trained with a dataset containing 476 13C and 270 1H experimental chemical shifts. For the DFT methods used here, the root mean square derivations for the errors between predicted and experimental 13C/1H chemical shifts can be as small as 2.10/0.18 ppm, which much lower than those from simple DFT (5.54/0.25 ppm), or DFT+linear regression (4.77/0.23) approaches. The robustness of the DFT+ML model is tested on two classes of organic molecules (TIC10 and hyacinthacines), where the correct isomers were unambiguously assigned to the experimental ones. The DFT+ML model is a promising way of NMR predictions and can be easily adapted to calculated shifts for any chemical compound.

Revised: September 29, 2020 | Published: August 24, 2020

Citation

Gao P., J. Zhang, Q. Peng, J. Zhang, and V. Glezakou. 2020. General Protocol for the Accurate Prediction of Molecular 13C/1H NMR Chemical Shifts via Machine Learning Augmented DFT. Journal of Chemical Information and Modeling 60, no. 8:3746-3754. PNNL-SA-143566. doi:10.1021/acs.jcim.0c00388