March 1, 2014
Journal Article

Bayesian model aggregation for ensemble-based estimates of protein pKa values

Abstract

This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid p$K_a$ predictions. Structure-based p$K_a$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$K_a$ prediction, ranging from empirical statistical models to {\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$K_a$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$K_a$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.

Revised: April 21, 2014 | Published: March 1, 2014

Citation

Gosink L.J., E.A. Hogan, T.C. Pulsipher, and N.A. Baker. 2014. Bayesian model aggregation for ensemble-based estimates of protein pKa values. Proteins. Structure, Function, and Bioinformatics 82, no. 3:354-363. PNNL-SA-95333. doi:10.1002/prot.24390