Due to the exponential growth of sequenced genomes, the need to quickly provide accurate annotation for existing and new sequences is paramount to facilitate biological research. Current sequence comparison approaches fail to detect homologous relationships when sequence similarity is low. Support vector machine (SVM) algorithms approach this problem by transforming all proteins into a feature space of equal dimension based on protein properties, such as sequence similarity scores against a basis set of proteins or motifs. This multivariate representation of the protein space is then used to build a classifier specific to a pre-defined protein family. However, this approach is not well suited to large-scale annotation. We have developed an SVM HOmology Tool (SHOT) that formulates remote homology as a single classifier that answers the pairwise comparison problem. SHOT integrates the two feature vectors for a pair of sequences into a single vector representation that can be used to build a classifier that separates sequence pairs into homologs and non-homologs, in lieu of pre-defined families. SHOT has the capability to attains homology scores in run-times competitive with the state-of-the-art PSI-BLAST algorithm. In addition, SHOT yields a dramatic increase in the number of accurate identifications on the benchmark dataset, quantified as the area under the Receiver Operating Characteristic curve; 0.97 for SHOT versus 0.73 and 0.70 for PSI-BLAST and BLAST, respectively.
Revised: November 12, 2010 |
Published: December 1, 2008
Citation
Webb-Robertson B.M., C.S. Oehmen, and A.R. Shah. 2008.A Feature Vector Integration Approach for a Generalized Support Vector Machine Pairwise Homology Algorithm.Computational Biology and Chemistry 32, no. 6:458-461. PNWD-SA-8160. doi:10.1016/j.compbiolchem.2008.07.017