February 24, 2006
Journal Article

Estimating Probabilities of Peptide Database Identifications to LC-FTICR-MS Observations

Abstract

Background: One of the grand challenges in the post-genomic era is proteomics, the characterization of the proteins expressed in a cell under specific conditions. A promising technology for high-throughput proteomics is mass spectrometry, specifically liquid chromatography coupled to Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR-MS). The accuracy and certainty of the determinations of peptide identities and abundances provided by LC-FTICR-MS are an important and necessary component of systems biology research. Methods: After a tryptically digested protein mixture is analyzed by LC-FTICR-MS, the observed masses and normalized elution times of the detected features are statistically matched to the theoretical masses and elution times of known peptides listed in a large database. The probability of matching is estimated for each peptide in the reference database using statistical classification methods assuming bivariate Gaussian probability distributions on the uncertainties in the masses and the normalized elution times. Results: A database of 69,220 features from 32 LC-FTICR-MS analyses of a tryptically digested bovine serum albumin (BSA) sample was matched to a database populated with 97% false positive peptides. The percentage of high confidence identifications was found to be consistent with other database search procedures. BSA database peptides were identified with high confidence on average in 14.1 of the 32 analyses. False positives were identified on average in just 2.7 analyses. Conclusions: Using a priori probabilities that contrast peptides from expected and unexpected proteins was shown to perform better in identifying target peptides than using equally likely a priori probabilities. This is because a large percentage of the target peptides were similar to unexpected peptides which were included to be false positives. The use of triplicate analyses with a “2 out of 3” reporting rule was shown to have excellent rejection of false positives.

Revised: March 24, 2006 | Published: February 24, 2006

Citation

Anderson K.K., M.E. Monroe, and D.S. Daly. 2006. Estimating Probabilities of Peptide Database Identifications to LC-FTICR-MS Observations. Proteome Science 4, no. 1:1-11. PNNL-SA-46729. doi:10.1186/1477-5956-4-1