Data Scientist Fixes His Sights on Speeding Up Drug Discovery

November 25, 2024

Feature

Data Scientist Fixes His Sights on Speeding Up Drug Discovery

AI, computing, and vast datasets combine to create new opportunities

Media Contact: PNNL News & Media Relations

An illustration showing a molecule in the center, surrounded by numbers and other data streams.

Scientists at the Center for AI @PNNL are using generative AI, reinforcement learning and other forms of artificial intelligence to create and uncover compounds to treat human disease.

(Illustration by Timothy Holland | Pacific Northwest National Laboratory)

Tools are the lifeblood of people who tinker. For a carpenter, the basics include a hammer, nails, saw, and tape measure.

Sometimes the tools are used in new ways, transforming woodworker into artist.

His tools may be different, but Chief Data Scientist Neeraj Kumar is a tinkerer of sorts. And an artist. He uses tools like artificial intelligence (AI) and ever-more-powerful computing to sculpt new molecules, building computational models of potential drugs and exploring existing compounds to see if they might be used in new ways.

The convergence of AI, high-performance computing, and extensive drug data offers unprecedented opportunities in biomedicine. It’s a dynamic and competitive field, with biologists, data scientists, computational scientists, pharmaceutical firms, and AI developers all racing to discover new chemical compounds to treat human diseases.

An advantage for Kumar: He’s at Pacific Northwest National Laboratory (PNNL), immersed in an environment rich in AI development, high-performance computing, hardware-software codesign, biology, pathogen research, and molecular modeling. This collaborative ecosystem enables scientists to explore new ideas through initiatives such as Mathematics for Artificial Reasoning in Science (MARS), where researchers develop AI models that accelerate scientific discoveries. They also work together through the Center for AI @PNNL, where they further AI expertise and explore applications in areas such as science, national security, and energy resilience.

Kumar uses AI, both traditional and generative, to search for compounds that might fit a target protein in just the right way to stop disease.

Like most proteins, the target protein under study—whether involved in cancer, Alzheimer’s disease, or another condition—is a dynamic structure. At times, it might be balled up tight like a bowling ball, unwilling to interact with anything until the right signal comes along. At other times, it might extend and dangle its atoms and receptors, on the hunt for an appropriate partner to connect with. Its nooks and crannies, molecular mountains and valleys, are ever changing.

Creating a molecule that fits into the constantly shifting and unpredictable landscape under just the right circumstances is the goal of Kumar and many other scientists. Sometimes they speak of a lock-and-key mechanism: The protein key needs just the right lock to turn it on or off. But the metaphor is too simple. It’s more like a lock whose shape changes moment to moment, constantly demanding keys with different contours.

“Techniques such as X-ray crystallography give you a snapshot of the protein in a moment, but how the protein changes over time is very important and much harder to capture,” said Kumar.

A rendering of the SARS-CoV-2 virus. — The SARS-CoV-2 virus—the cause of COVID—is one target of drug discovery efforts at Pacific Northwest National Laboratory. (Illustration by Naeblys | Shutterstock.com)

Simply knowing the string of amino acids that determine a protein isn’t enough; it’s essential to understand how that code results in the protein’s function and 3D structure. That’s why Kumar and others celebrated last year when DeepMind demonstrated and released AlphaFold2, an AI program designed to reveal the 3D structure of all known proteins.

“Biology happens in three dimensions,” said Kumar. “You can’t get around that. And to work in 3D, with multiple parameters affecting the 3D structure of more than one molecule, you need exceptional computational power and energy-efficient AI architecture.”

While work like AlphaFold2 was a breakthrough in predicting protein structures that might be disease targets, that’s only half the equation. Scientists are left with the problem of designing molecules that could bind to those proteins, to turn them on or off.

Kumar used AI approaches known as reinforcement learning and deep learning to create a 3-D scaffold technology to design molecules step by step. The program analyzes the chemical environment around a scaffold—a starting point for a new molecule—and calculates which atom or chemical bond might work best next. The program builds a molecule step by step, recording the effects of each action.

Every step of the proposed molecule, every “rough draft,” is saved to inform subsequent explorations. Generative AI makes it possible to propose a seemingly endless string of possibilities on the fly.

“The target protein and the developing molecule are in a constant dance,” said Kumar. “Finally, we have the computational power to model this interaction. The 3D structures of the protein targets are available, and the drug candidates we are generating are also in 3D.”

Throughout this process, Kumar uses reinforcement learning to evaluate every action and reaction, optimizing the molecule’s design based on the predictive criteria.

Reinforcement learning is familiar to anyone with children. It’s a model that rewards behaviors and outcomes we want to encourage and punishes (or provides a lesser reward for) undesirable outcomes. The rewards come in the form of digits, with a nice big packet of digits for especially promising approaches.

Many ingredients make up the recipe for drug discovery. One is binding affinity: How tightly does the test molecule hold onto the target of interest? The tighter, like a molecular bear hug, the better.

Another is efficacy. How well does the molecule stop the protein from doing what it otherwise might, such as hooking up with another protein to begin an untoward chain of events that results in disease?

Yet another is synthetic accessibility. Can it be made practically?

And then there’s novelty. Finding a molecule that does those things but has never been considered before, at least in the public literature, is a big bonus.

Neeraj Kumar looks directly at the viewer while sitting in front of a computer monitor. — “Biology happens in three dimensions,” says data scientist Neeraj Kumar, who has created a technology to design molecules step by step. (Photo courtesy of Neeraj Kumar)

“In principle, it’s very simple,” said Kumar. “You design a program that generates positive rewards and avoids negative rewards based on what you want it to do, which is to find an optimal drug. But, in practice, it’s very challenging to design an effective reward function.”

Where does the database get its guiding principles? How does it recognize good from bad?

For that, Kumar and his team draw on an immense, publicly available dataset of information about how thousands of medications have performed in the laboratory and clinical trials. That includes information about their molecular workings and how they affected people who took the drugs. The program should not recommend molecular structures similar to those that have caused bad side effects and should adopt structures that bind tightly to the protein target.

The data for judgment comes from thousands of drugs approved by the Food and Drug Administration as well as other compounds that have been created and evaluated.

The program doles out rewards and punishments and ranks the possibilities. Kumar might instruct the program to build 1,000 molecules to target a protein involved in diabetes, for instance. Then, he might trim that down to the best 100 and adjust the parameters more tightly, making it tougher for a molecule to earn a reward. For example, he might draw on data about whether the structures seem likely to provoke immune reactions or cross the blood-brain barrier.

From there, he might select the top 10 candidates, then consult with biomedical scientists experienced in the laboratory to get their input. Do the structures look feasible? What would it take to create these compounds in the laboratory—is it possible and practical?

“We hope to design drug candidates faster, and we learn from previous successes and also the failures. Frankly, we want the ones that will fail to fail faster. It speeds up the drug discovery process,” said Kumar, whose work is the basis for a pending patent application, “Drug discovery via reinforcement learning with three-dimensional modeling.”

“Many drugs take billions of dollars for development. We’re trying to speed the pre-clinical development, reduce costs, and bring about new treatments to patients more quickly,” he added.

Kumar demonstrated the potential in papers published in the Journal of Computer-Aided Molecular Design and the Journal of Chemical Information and Modeling. The team modeled potential drugs designed to stop an enzyme called Mpro, which plays an important role in the replication of SARS-CoV-2, the virus that causes COVID. Kumar’s team is working to demonstrate the technology in other disease states as well.

Current research interests include developing compounds designed to target molecular machinery that is common to many viruses. Coronaviruses, Zika, and dengue fever are among the targets in the project funded by the Defense Threat Reduction Agency. Kumar is also involved in multi-lab efforts to explore vaccine technology and to speed the discovery of new cancer treatments.

The 3-D scaffold technology is available for licensing. For more information about licensing opportunities, contact PNNL’s Office of Collaboration & Commercialization at commercialization@pnnl.gov and reference Battelle IPID 32326-E.

Supporters of this work have included the Department of Energy’s National Virtual Biotechnology Laboratory, DOE’s Advanced Scientific Computing Research program, PNNL’s MARS Initiative, and PNNL’s Office of Collaboration and Commercialization.

###

About PNNL

Pacific Northwest National Laboratory draws on its distinguishing strengths in chemistry, Earth sciences, biology and data science to advance scientific knowledge and address challenges in energy resiliency and national security. Founded in 1965, PNNL is operated by Battelle and supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the DOE Office of Science website. For more information on PNNL, visit PNNL's News Center. Follow us on Twitter, Facebook, LinkedIn and Instagram.

Published: November 25, 2024