March 15, 2008
Journal Article

Proteome-wide identification of proteins and their modifications with decreased ambiguities and improved false discovery rates using unique sequence tags

Abstract

Identifying proteins correctly and with known levels of confidence remain as significant challenges for proteomics. Random or decoy peptide databases are increasingly being used to estimate the false discovery rate (FDR), e.g., from liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of tryptic digests. We show that this approach can significantly underestimate the FDR, and describe an approach for more confident protein identifications that uses unique partial sequences derived from a combination of database searching and de novo-style data analyses of high precision MS/MS data. Applied to a Saccharomyces cerevisiae tryptic digest, the approach provided 3,132 confident peptide identifications (~5% modified in some fashion), covering 575 proteins with an estimated zero FDR. The conventional approach provided 3,359 peptide identifications and 656 proteins with 0.3% FDR based upon a decoy database analysis. However, the present approach revealed ~5% of the 3,359 identifications to be incorrect, and many more as potentially ambiguous, (e.g., due to not considering certain amino acid substitutions and modifications). In addition, 677 peptides and 39 proteins were identified that had been missed by conventional analysis, including non-tryptic peptides, peptides with various expected/unexpected chemical modifications, known/unknown posttranslational modifications, single nucleotide polymorphisms or gene encoding errors, and multiple modifications of individual peptides.

Revised: October 6, 2015 | Published: March 15, 2008

Citation

Shen Y., N. Tolic, K.K. Hixson, S.O. Purvine, L. Pasa-Tolic, W. Qian, and J.N. Adkins, et al. 2008. Proteome-wide identification of proteins and their modifications with decreased ambiguities and improved false discovery rates using unique sequence tags. Analytical Chemistry 80, no. 6:1871-82. PNNL-SA-56012. doi:10.1021/ac702328x