July 14, 2020
Journal Article

A Look Inside the Black Box: Using graph-theoretical descriptors to interpret a Continuous-Filter Convolutional Neural Network (CF-CNN) trained on the global and local minimum energy structures of neutral water clusters

Abstract

A Continuous Filter Convolutional Neural Network (CF-CNN) was trained to predict the potential energy of water cluster networks \ce{(H2O)_{\textit{N}}}, \textit{N}=10--30, corresponding to local minima lying within 5 kcal/mol from the putative minima taken from a newly published database containing over 5 million unique networks. The chemical sampling space of the database was characterized using chemical descriptors derived from graph theory, which led to the identification of important trends in the topology, connectivity, polygon structures associated with the various networks as a function of cluster size. The resulting graphs are available alongside the original database at \url{https://sites.uw.edu/wdbase/}. The CF-CNN trained on a subset of 500,000 networks for (\textit{N}=10, 30) yielded a mean absolute error of 0.002$\pm$0.002 kcal/mol per water molecule, giving the trained CF-CNN the highest accuracy of any neural network-based surrogate model to date. In addition, clusters of sizes not included in the training set exhibited errors of the same magnitude, indicating that the CF-CNN ptotocol is general enough to accurately predict energies of networks for both smaller and larger sizes than those used during training. The graph-theoretical descriptors were developed in order to analyze the properties of the full database and interpret the predictive power of the CF-CNN. Using topology measures, such as the Wiener index and the average shortest path length along with two similarity measures, we showed that all networks from the test set were within the range of the ones from the training set, suggesting that the training set covered the chemical space of interest quite well. Our graph analysis suggests that the mean degree and number of polygons for networks with larger errors tend to lie further from the mean than those with lower errors. The generality of the used CF-CNN was thus demonstrated, while the use of the graph-theoretical descriptors assisted in interpreting the predicted results.

Revised: August 5, 2020 | Published: July 14, 2020

Citation

Bilbrey J.A., J. Heindel, M. Schram, P. Bandyopadhyay, S.S. Xantheas, and S. Choudhury. 2020. A Look Inside the Black Box: Using graph-theoretical descriptors to interpret a Continuous-Filter Convolutional Neural Network (CF-CNN) trained on the global and local minimum energy structures of neutral water clusters. Journal of Chemical Physics 153, no. 2:024302. PNNL-SA-152462. doi:10.1063/5.0009933