In the last few years, we have seen the rise of deep learning applications in a broad range of computational chemistry research problems. Using human-engineered chemical features, such as molecular descriptors and fingerprints, deep learning models have shown similar, if not better performance that most traditional machine learning algorithms. Recently, we reported on the development of Chemception, a deep convolutional neural network (CNN) architecture for general-purpose small molecule property prediction. On average, Chemception matched the performance of expert-developed QSAR/QSPR models trained on chemical features (molecular fingerprints), despite that it was trained on just 2D images of molecular drawings with minimal chemical information. Here, we investigate the effects of systematically removing and adding basic chemical information to the image channels of the 2D images used to train Chemception. By augmenting our images with only 3 additional basic chemical information, we demonstrate the improvement of Chemception performance – that it is now more accurate than contemporary deep learning models trained on ECFP fingerprints for the prediction of toxicity, activity and solvation free energy, as well as physics-based free energy simulation methods for computing solvation properties. By altering the chemical information content in the image channels, and examining the resulting performance of Chemception, we also identify to two different “learning patterns” in toxicity/activity as compared to solvation free energy, and it parallels the fundamental differences in contemporary chemistry research for predicting toxicity/activity and solvation free energy.
Revised: August 22, 2019 |
Published: May 7, 2018
Citation
Goh G.B., C.M. Siegel, A. Vishnu, N.O. Hodas, and N.A. Baker. 2018.How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?. In IEEE Winter Conference on Applications of Computer Vision (WACV 2018), March 12-15, 2018, Lake Tahoe, NV, 1340-1349. Piscataway, New Jersey:IEEE.PNNL-SA-127201.doi:10.1109/WACV.2018.00151