A MapReduce-based implementation called MR- MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.
Revised: August 12, 2014 |
Published: November 1, 2011
Citation
Kalyanaraman A., W.R. Cannon, B.K. Latt, and D.J. Baxter. 2011.MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification.Bioinformatics 27, no. 21:3072-3073.PNNL-SA-83443.doi:10.1093/bioinformatics/btr523