November 17, 2011
Journal Article

Proteogenomic analysis of bacteria and archaea: A 46 organism case study

Abstract

Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 51 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 701 novel proteins, 1365 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1095 signal peptides. Proteomics provides a powerful experimental data type to access and improve the quality of genome annotation. A key advantage is the direct correlation between protein annotation and a protein based assay. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.

Revised: January 9, 2012 | Published: November 17, 2011

Citation

Venter E., R.D. Smith, and S.H. Payne. 2011. Proteogenomic analysis of bacteria and archaea: A 46 organism case study. PLoS One 6, no. 11:Article No. e27587. PNNL-SA-75723. doi:10.1371/journal.pone.0027587