March 5, 2025
Journal Article
CaXML: Chemistry-informed Machine Learning Explains Mutual Changes between Protein Conformations and Calcium Ions in Calcium-binding Proteins Using Structural and Topological Features
Abstract
Proteins' fuzziness are features for communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. Binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit, but it is unclear whether the limited experimental data available can be used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed machine learning approach that implements a game theoretic approach to explain the output of a machine learning model without the perquisite of excessively large database for high performance prediction of atomic charges. We used the first-principle electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination number of a calcium ion. We found that the coordination chemistry of calcium ions is a potent indicator of their charges in protein. With our designs, we are able to provides a framework of explainable machine learning to annotate atomic charges of calcium ions in calciumbinding proteins in response to the chemical changes in the environment based on limited scientific domain knowledge.Published: March 5, 2025