November 21, 2024
Report
Statistically-driven Experimental Design to Improve Reference-free Quantification of Small Molecules by Liquid Chromatography-Mass Spectrometry
Abstract
Non-targeted analysis of small molecules and metabolites in unknown, complex samples using liquid chromatography-tandem mass spectrometry remains challenging. One of the main bottlenecks is the extensive unannotated regions of metabolomics mass spectrometry data, resulting in knowledge gaps. Small molecule annotation in mass spectrometry data has conventionally relied on reference standards and libraries for compound identification and confirmation, which can constrain compound identification to those molecules already known, thus limiting the ability to discover new knowledge and new markers. Retention time prediction can facilitate and expedite unknown compound identification in non-targeted analysis of complex metabolomics samples. Additionally, accurate retention time predictions can also inform sample mixture design for LC-MS/MS analyses. However, current machine learning-based methods for retention time prediction are typically developed for specific chromatographic platforms and are not generalizable across scales. And while technologies and methods to improve reference-free metabolite identification for more comprehensive annotation of unknowns has received much attention, development of the same for quantitation without reference standards has been much more limited, despite its importance in toxicological, environmental, food safety, forensics, and clinical applications. We believe that a reference-free quantitation strategy that exploits mass spectrometry data already collected for reference-free identification can provide much more insight on unknowns, and move the metabolomics field for more complete unknowns characterization. As such, we pursue two efforts to improve upon current state-of-the-art methods in non-targeted analysis: (1) machine learning-based retention time prediction and (2) statistical design of experiments framework for reference-free quantitation. In this work, we develop and demonstrate (1) a generalizable retention time prediction capability across chromatographic conditions and scales, and (2) a statistical design-based framework for response factor contribution elucidation and reference-free quantitation. Evaluation of our retention time prediction model, PrediToR, showed approximately 24% improvement over current models, and we observed approximately 10X improvement in concentration estimation accuracy from our statistical design-based response factor model over a primarily ionization efficiency-based model. We expect that future efforts to improve upon these new capabilities will further advance non-targeted analysis of small molecules towards truly reference-free metabolomics.Published: November 21, 2024