MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification

November 1, 2011

Journal Article

MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification

Abstract

A MapReduce-based implementation called MR- MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.

Revised: August 12, 2014 | Published: November 1, 2011

Citation

Kalyanaraman A., W.R. Cannon, B.K. Latt, and D.J. Baxter. 2011. MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification. Bioinformatics 27, no. 21:3072-3073. PNNL-SA-83443. doi:10.1093/bioinformatics/btr523