Bottom-up proteomics is increasingly being used to characterize unknown environmental, clinical, and forensic samples. Proteomics-based bacterial identification typically proceeds by tabulating peptide “hits” (i.e. confidently identified peptides) associated with the organisms in a database; those organisms with enough hits are declared present in the sample. This approach has proven successful in laboratory studies, however, important research gaps remain. First, the common-practice reliance on unique peptides for identification is susceptible to a phenomenon known as signal erosion. Second, no general guidelines are available for determining how many hits are needed to make a confident identification. These gaps inhibit the transition of this approach to real-world samples where conditions vary and large databases may be needed. We propose statistical criteria for organism identification that applies regardless of sample quality or data analysis pipeline. These criteria are straightforward, producing a p-value on the result. We test the criteria on LC-MS/MS datasets representing 919 different bacterial and toxin datasets acquired with multiple data collection platforms. Results reveal a >95% correct species-level identification rate, demonstrating the effectiveness and robustness of proteomics-based organism/toxin identification.
Revised: January 12, 2021 |
Published: September 7, 2018
Citation
Jarman K.H., N.C. Heller, S.C. Jenson, J.R. Hutchison, B. Kaiser, S.H. Payne, and D.S. Wunschel, et al. 2018.Proteomics Goes to Court: A Statistical Foundation for Forensic Toxin/Organism Identification Using Bottom-Up Proteomics.Journal of Proteome Research 17, no. 9:3075-3085.PNNL-SA-133122.doi:10.1021/acs.jproteome.8b00212