Graph-Based Machine Intelligence to Automate Molecular Design

PI: Neeraj Kumar
Efficient discovery and production of new therapeutic candidates are essential for discovery of medicines to counter emerging threats. Understanding protein‒ligand interactions (PLIs) is a critical step in facilitating small molecule design. Current approaches to develop candidates for a given application are intuition-driven, hindered in slow iterative design/test cycles, and ultimately limited by the specific molecular expertise of the chemist and bottlenecks in molecular design. Several technical challenges limit the use of machine learning for modeling enzymatic processes and accurate prediction of PLIs. The first challenge relates to the limited availability of protein-ligand 3D data provided to the machine learning models and how we currently generate new chemical knowledge. The second challenge focuses on the appropriate representation of the data, i.e., determining the necessary features of a biomolecular system and describing them in an algorithmically accessible form (domain knowledge).
Towards this goal, PNNL data scientists plan to create a highly efficient molecular design, using artificial intelligence, to produce novel high-performance molecules with domain-targeted properties. We plan to implement a novel, distance aware, graph-based approach to predict protein-ligand interaction and their binding affinity. We will develop novel parallel graph neural networks to integrate knowledge representation and reasoning to perform deep learning guided by expert knowledge and informed by data.
PNNL’s approach will not only advance molecular design via decoding the protein-ligand interactions, but the ability to identify and characterize novel candidates for given protein target/disease that would accelerate the hit identification and lead optimization for therapeutic discovery, a much-needed capability today.