May 1, 2006
Journal Article

Software to perform automated comparisons of pairwise percent identities for microbial species

Abstract

The field of comparative genomics, which makes inferences about the properties of an organism through comparison of its genome to the genomes of related organisms, has seen rapid growth as a result of high-throughput sequencing initiatives. At the time of this study, there are almost 300 completely sequenced microbial genomes, and over 500 partially sequenced microbial genomes. This wealth of genomic data means that, for a given species, there will likely be many related species with sequenced genomes available for comparison. This is useful, for example, when identifying transcription factor binding sites using phylogenetic footprinting. It has been shown that including closely related species is valuable in phylogenetic footprinting studies, because species within a close phylogenetic group are likely to share common regulatory mechanisms (1,2). Regulatory motif detection is complicated, however, by the sequence correlation present between related species. Here we define sequence correlation as similarity between orthologous sequences that is due to recent speciation rather than functional constraints. Therefore, it is useful to understand the level of correlation in the sequence data prior to initiating a focused comparative genomics study such as phylogenetic footprinting. We present two programs (collect.identity.pl and analyze.identity.pl) that automate the tasks of performing pairwise sequence alignments between sets of homologous sequences and generating summary statistics, as well as the data required for additional statistical analyses. This provides a way to compare a focused group of related genomes across a large number of homologous loci.

Revised: October 25, 2007 | Published: May 1, 2006

Citation

Conlan S., and L.A. McCue. 2006. Software to perform automated comparisons of pairwise percent identities for microbial species. BioTechniques 40, no. 5:578-582. PNWD-SA-7338.