January 1, 2008
Journal Article

Proteogenomics: the needs and roles to be filled by proteomics in genome annotation

Abstract

While genome sequencing efforts reveal the basic building blocks of life, a genome sequence alone is insufficient for elucidating biological function. Genome annotation – the process of identifying genes and assigning function to each gene in a genome sequence – provides the means to elucidate biological function from sequence. Current state-of-the-art high throughput genome annotation uses a combination of comparative (sequence similarity data) and non-comparative (ab initio gene prediction algorithms) methods to identify open reading frames in genome sequences. Because approaches used to validate the presence of these open reading frames are typically based on the information derived from the annotated genomes, they cannot independently and unequivocally determine whether a predicted open reading frame is translated into a protein. With the ability to directly measure peptides arising from expressed proteins, high throughput liquid chromatography-tandem mass spectrometry-based proteomics, approaches can be used to verify coding regions of a genomic sequence. Here, we highlight several ways in which high throughput tandem mass spectrometry-based proteomics can improve the quality of genome annotations and suggest that it could be efficiently applied during the initial gene calling process so that the improvements are propagated through the subsequent functional annotation process.

Revised: October 16, 2015 | Published: January 1, 2008

Citation

Ansong C., S.O. Purvine, J.N. Adkins, M.S. Lipton, and R.D. Smith. 2008. Proteogenomics: the needs and roles to be filled by proteomics in genome annotation. Briefings in Functional Genomics and Proteomics 7, no. 1:50-62. PNNL-SA-57607.