Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

May 6, 2009

Book Chapter

Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

Abstract

Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample

Revised: November 12, 2010 | Published: May 6, 2009

Citation

Webb-Robertson B.M. 2009. Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search. In Mass Spectrometry of Proteins and peptides: Methods in Molecular Biology Vol 146. New York, New York:Humana Press. PNNL-SA-51841.