July 11, 2023
Journal Article

Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection

Abstract

Generating top-down tandem mass spectra (MS/MS) for complex mixtures of proteoforms has become possible through improvements in fractionation, on-line separation, dissociation, and mass analysis. The algorithms to match tandem mass spectra to sequences have undergone a parallel evolution, with both spectral alignment and peak matching being paired with diverse methods for scoring proteoform-spectral matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification through three distinct challenges. The first is identifying a large yield of PrSMs while controlling false discovery rate (FDR) in identifying thousands of proteoforms from complex cell lysates via four software workflows: ProSight Proteome Discoverer, TopPIC, Informed Proteomics, and pTop. The second is the deconvolution of data from both Thermo Orbitrap-class and Bruker maXis Q-TOF instruments to produce consistent precursor charge and mass determinations while generating fragment mass lists to optimize identification. The third attempts to detect diverse post-translational modifications (PTMs) in proteoforms from cow milk and human ovarian tissue. The data demonstrate that existing software suites produce admirable sensitivity, in some cases identifying a third of collected tandem mass spectra with FDR controlled below 2%; the overlap in these PrSMs, however, illustrates real value in searching data with multiple search engines. Differences among identification workflows seem to result from each search algorithm incorporating its own deconvolution algorithm. By transmitting deconvolution data from multiple deconvolution routes (Thermo Xtract, Bruker Auto MSn, Mascot Distiller, TopFD, and FLASHDeconv) to the downstream TopPIC search algorithm, we were able to detect common causes of deconvolution disagreement. The detection of PTMs was very inconsistent among search algorithms, with some workflows suggesting as little as 1% of PrSMs from cow’s milk were singly-phosphorylated while other workflows found that 18% of PrSMs were singly-phosphorylated. Taken together, these results make a strong argument for top-down researchers to adopt a standard practice of analyzing each MS/MS experiment with at least two different search engines.

Published: July 11, 2023

Citation

Tabb D., K. Jeong, K. Druart, M. Gant, K.A. Brown, C.D. Nicora, and M. Zhou, et al. 2023. Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection. Journal of Proteome Research 22, no. 7:2199–2217. PNNL-SA-178802. doi:10.1021/acs.jproteome.2c00673