November 21, 2024
Report

Molecular Vision - Multimodal, multitask retrieval of molecular structure from measured signatures for reference-free compound identification

Abstract

We are currently at risk of generating false conclusions based on limited methods to identify small molecules in biological systems and in chemical forensics. By definition, the chemical structures of novel small molecules have not been determined, let alone measured or synthesized. Currently, unambiguous structure determination of small molecules is constrained by the time and effort needed to isolate compounds and perform de novo structure elucidation using laboratory-based methods, significantly extending the time to inform mitigation strategies. To address this gap, we have developed a deep learning approach to directly map molecular structure to experimental signatures. We aim to unify measurement technologies employed in untargeted small molecule identification studies—such as infrared (IR) spectrometry, tandem mass spectrometry (MS/MS), ion mobility spectrometry-derived collision cross section (CCS)—through use of a multimodal, multitask deep learning architecture. Where existing methods require direct generation of information-rich spectra and/or properties, an inherently difficult task, we will simplify molecular signature-based identification by posing the problem as a recognition or retrieval task. The model is thus presented with relevant endpoints – structure and one or more molecular signatures – and need only determine whether they are semantically related. Thus, our approach offers the following advantages over existing techniques: (i) circumvents difficulties associated with direct generation of molecular signatures from structure and structure from signatures; (ii) incorporates multiple molecular signatures simultaneously, as available, to support identification; and (iii) enables rapid computation of structural embeddings toward broad coverage of known chemical space. Taken together, the approach removes the need to explicitly obtain or compute reference spectra, representing a powerful method for compound identification that requires only experimentally observed signatures.

Published: November 21, 2024

Citation

Chang C.H., S.M. Colby, J.L. Cooper, C.N. Svinth, and J.L. Yaros. 2024. Molecular Vision - Multimodal, multitask retrieval of molecular structure from measured signatures for reference-free compound identification Richland, WA: Pacific Northwest National Laboratory.