Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics

December 1, 2007

Conference Paper

Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics

Abstract

Proteomics is a rapidly advancing field offering a new perspective to biological systems. Mass spectrometry (MS) is a popular experimental approach because it allows global protein characterization of a sample in a high-throughput manner. The identification of a protein is based on the spectral signature of fragments of the constituent proteins, i.e., peptides. This peptide identification is typically performed with a computational database search algorithm; however, these database search algorithms return a large number of false positive identifications. We present a new scoring algorithm that uses a SVM to integrate database scoring metrics with peptide physiochemical properties, resulting in an improved ability to separate true from false peptide identification from MS. The Peptide Identification Classifier SVM (PICS) score using only five variables is significantly more accurate than the single best database metric, quantified as the area under a Receive Operating Characteristic curve of ~0.94 versus ~0.90.

Revised: May 9, 2008 | Published: December 1, 2007

Citation

Webb-Robertson B.M., C.S. Oehmen, and W.R. Cannon. 2007. Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics. In The Sixth International Conference on Machine Learning and Applications (ICMLA ’07), 500-505. Washington Dc:IEEE Computer Society. PNNL-SA-58675. doi:10.1109/ICMLA.2007.17