PGP: Parallel Prokaryotic Proteogenomics Pipeline for MPI clusters, high-througput batch clusters and multicore workstations

January 27, 2014

Journal Article

PGP: Parallel Prokaryotic Proteogenomics Pipeline for MPI clusters, high-througput batch clusters and multicore workstations

Abstract

As most genome annotation pipelines consist of automated gene finding, they lack experimental validation of primary structure, having to rely on DNA centric sources of data. Through the analysis of proteomics mass spectrometry data, our protocol is able to improve the existing annotations by discovering novel genes, post-translational modifications (PTMs) and correcting the erroneous primary sequence annotations. PGP pipeline is designed to run in a wide range of parallel Linux computing environments in order to address the high computational cost of proteomics data processing. It has been already used to improve the annotation of 46 genomes across the prokaryotic tree of life. Availability and Implementation: Source code is freely available from https://bitbucket.org/andreyto/proteogenomics under GPL license. It is implemented in Python and C++. It bundles the Makeflow engine to execute the workflows.

Revised: May 22, 2014 | Published: January 27, 2014

Citation

Tovchigrechko A., P. Venepally, and S.H. Payne. 2014. PGP: Parallel Prokaryotic Proteogenomics Pipeline for MPI clusters, high-througput batch clusters and multicore workstations. Bioinformatics 30, no. 10:1469-1470. PNWD-SA-10240. doi:10.1093/bioinformatics/btu051