July 25, 2025
Journal Article
Data Set analysis to reduce uncertainty in formula assignments of ultrahigh resolution mass spectra
Abstract
Molecular formulas assigned to accurate, highly resolved m/z are often described as ‘unequivocal,’ implying that an assigned formula is the only possible formula that could be assigned to a measured m/z. Yet, it is rarely the case that a formula assigned with more than a few atoms is truly unequivocal, especially when elements other than C, H, and O are allowed in the assignment. This presents a challenge for the untargeted analysis of biological and environmental samples, which typically contain a vast array of heteroatom-containing metabolites and polydisperse organic matter. Allowing assignments with heteroatoms is necessary to accurately characterize these samples, but doing so inevitably introduces more false assignments and greater uncertainty in the ecological, biological, and biogeochemical insights that are gleaned from these data. Addressing this challenge, we introduce a general strategy for filtering false assignments from datasets collected on ultrahigh resolution mass spectrometers. The strategy, which leverages CoreMS, employs the calculation of a confidence score for assigned monoisotopic formulas based on the detection and assignment of isotopologues, along with an analysis of mass error distributions to flag and correct or remove false assignments. We illustrate the implementation and utility of the strategy by examining molecular formula assignments across a set of oceanographic samples that were measured with 21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometry.Published: July 25, 2025