Biological Sciences Division
Comprehensive Whole-Proteome Analysis Now Available to Scientific Community
Shows potential of mass-spec-based proteomics for improving genome and proteome annotations
Results: Researchers from the University of California-San Diego, the Burnham Institute for Medical Research and Pacific Northwest National Laboratory recently demonstrated the capability to rapidly and efficiently improve a proteome annotation. Their work highlights the potential for using mass spectrometry-based proteomics to complement genome sequencing and improve both genome and proteome annotations. The results appeared in the September 2007 issue of Genome Research.
The researchers used liquid chromatography-coupled mass spectrometry (LC-MS/MS) for proteomic and genomic annotations of the bacterium Shewanella oneidensis MR-1. While bacterial genome annotations have improved significantly in recent years, the number of sequenced bacterial genomes is rising sharply, far outpacing scientists' ability to validate the predicted genes, let alone annotate bacterial proteomes. Also, opposed to annotation processes, techniques—such as determination of post-translational chemical modifications, signal peptides and proteolytic events—are still in their infancy.
Why it matters: The work is addressing the potential for proteomics measurements as they relate to understanding biological systems important to environmental remediation efforts. The study demonstrates that complementing every genome-sequencing project by an MS/MS project would significantly improve both genome annotations at a reasonable cost. It is also one of the most complete proteome analyses ever made available to the scientific community
Methods: Using 0.7 Terabytes (~150 DVDs) of data obtained from numerous proteomics studies of S. oneidensis MR-1, an important microbe for bioremediation, the research team generated the first comprehensive map of post-translational modifications in a bacterial genome. The massive volume of data, which was provided by PNNL's High-Throughput Proteomics Production Facility, was the result of steady growth in throughput and data production over recent years. It included samples from 17 cell culture conditions and comprised the largest LC-MS/MS dataset ever reported for a bacterium. The map also detected multiple genes that were either missed or assigned incorrect start positions by gene prediction programs. Using very conservative cutoffs for peptide identifications, they confirmed the protein expression of 1992 out of 4928 predicted genes from The Institute for Genomic Research.
They also corrected or redefined the gene boundaries of 38 genes, eight of which were new and not included in the genome, and provided evidence for 13 gene products that were previously annotated as pseudogenes. Additionally, the researchers confirmed the signal processing for 94 proteins. All of these results contribute to the verification and validation of gene expression for use in future studies.
These charts show the correlations between coverage of individual proteins by MS-peptides and their biological features deduced from comparative genomics. Among the proteins with the best coverage, there was a strong correlation with conservation in many genomes (blue bar chart) or essentiality in Escherichia coli (red bar chart). Shown are functional categories by coverage (multicolored bar chart), demonstrating that some proteins are more represented at different levels of coverage in functional annotations from both TIGR and the Fellowship for Interpretation of Genomes’ SEED functional annotations. Enlarged View
Acknowledgments: The research team members are Nitin Gupta, Stephen Tanner, Vineet Bafna and Pavel Pevzner (UCSD); Robert Edwards and Andrei Osterman (Burnham Institute for Medical Research); and Navdeep Jaitly, Joshua Adkins, Mary Lipton, Margie Romine and Dick Smith (PNNL). Data were made available through the efforts of Ken Auberry, Shaun O'Leary, Tara Gibson, and Lee Ann McCue (PNNL). The work was supported by the Genomics: GTL program of DOE's Office of Biological and Environmental Research. Portions of the work were conducted in the Environmental Molecular Sciences Laboratory, a DOE national scientific user facility located at PNNL.
Reference: Gupta N, S Tanner, N Jaitly, JN Adkins, MS Lipton, R Edwards, MF Romine, A Osterman, V Bafna, RD Smith, and P Pevzner. 2007. "Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation." Genome Research 17:1362-1377.