Motivation: The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods that compare the sequences on features been shown to have superior accuracy in comparison to traditional approaches. Extracting features in a computationally inexpensive manner that retains the sensitivity of SVM protein classification remains a topic of considerable interest. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for further application. Results: We introduce a new approach to feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. There is little computational cost associated with the generation and classification of a query sequence and the overall performance is comparable with current state-of-the-art methods. In addition, we demonstrate that the features can be used for the task of pairwise remote homology detection.
Revised: November 4, 2010 |
Published: March 19, 2010
Citation
Webb-Robertson B.M., K. Ratuiste, and C.S. Oehmen. 2010.Physicochemical property distributions for accurate and rapid pairwise protein homology detection.BMC Bioinformatics 11. PNWD-SA-8719. doi:10.1186/1471-2105-11-145