December 28, 2020
Journal Article

Chespa: Streamlining Expansive Chemical Space Evaluation of Molecular Sets

Abstract

Thousands of chemical properties can be calculated for small molecules, which can be used to place the molecules within the context of a broader “chemical space.” These definitions vary based on compounds of interest and the goals for the given chemical space definition. Here, we introduce a customizable (i.e., modular) Python module, chespa, built to easily assess different chemical space definitions through cluster-ing of compounds in these spaces and visualize trends of these clusters. To demonstrate this, chespa currently streamlines prediction of vari-ous molecule descriptors (predicted chemical properties, molecular substructures, AI-based chemical space, and chemical class ontology) in order to test 6 different chemical space definitions. Furthermore, we investigated how these varying definitions trend with mass spectrometry (MS)-based observability, i.e., the ability of a molecule to be observed with MS (e.g., as a function of the molecule ionizability), using an example data set from the U.S. EPA's Non-Targeted Analysis Collaborative Trial (ENTACT), where blinded samples had been analyzed previously, providing 1,398 data points. Improved understanding of observability would offer many advantages in small molecule identifica-tion, such as (i) a priori selection of experimental conditions based on suspected sample composition, (ii) the ability to reduce the number of candidate structures during compound identification by removing those less likely to ionize, and, in turn, (iii) a reduced false discovery rate and increased confidence in identifications. Factors controlling observability are not fully understood, making prediction of this property non-trivial and a prime candidate for chemical space analysis. Chespa is available at github.com/pnnl/chespa.

Revised: December 31, 2020 | Published: December 28, 2020

Citation

Nunez J., M.Y. McGrady, Y. Yesiltepe, R.S. Renslow, and T.O. Metz. 2020. Chespa: Streamlining Expansive Chemical Space Evaluation of Molecular Sets. Journal of Chemical Information and Modeling 60, no. 12:6251–6257. PNNL-SA-154255. doi:10.1021/acs.jcim.0c00899