Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. Profile models assume the statistical independence of each position in the sequence, but the energetics of protein folding is better captured in a scoring function that is based on pairwise interactions, like a force field. I-sites motifs are short sequence/structure motifs that populate the protein structure database due to energy-driven convergent evolution. Here we show that a pairwise covariant sequence model does not predict alpha helix or beta strand significantly better overall than a profile-based model, but it does improve the prediction of certain loop motifs. The finding is best explained by considering secondary structure profiles as multivariant, all-or-none models, which subsume covariant models. Pairwise covariance is nonetheless present and energetically rational. Examples of negative design are present, where the covariances disfavor non-native structures. Measured pairwise covariances are shown to be statistically robust in cross-validation tests, as long as the amino acid alphabet is reduced to nine classes. We present an updated I-sites local structure motif library and web server that provide sequence covariance information for all types of local structure in globular proteins.
Revised: November 12, 2010 |
Published: May 6, 2009
Citation
Bystroff C., and B.M. Webb-Robertson. 2009.Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure.BMC Bioinformatics 9, no. 429:10 pages.PNNL-SA-58982.doi:10.1186/1471-2105-9-429