January 10, 2025
Article

New AI Agent Connects Computer Reasoning with Chemistry

CACTUS provides AI-powered cheminformatics for autonomous science

Image of a cactus robot holding molecules

CACTUS integrates cheminformatics and large language models to enable molecular discovery. 

(Image by Nathan Johnson | Pacific Northwest National Laboratory)

At Pacific Northwest National Laboratory (PNNL), researchers are working to enable autonomous experimentation in biology, chemistry, and materials science. PNNL’s latest advancement toward achieving this goal is the development of CACTUS—the Chemistry Agent Connecting Tool Usage to Science. Led by Chief Data Scientist Neeraj Kumar, the researchers combined cheminformatics tools with artificial intelligence to create an agent capable of assisting researchers in the design of new molecules. 

“CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery,” said Kumar. “By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials.” 

Enhanced reasoning with chemistry tools

The initial idea with CACTUS was to develop a large language model (LLM)-based cheminformatics assistant where researchers could ask questions about their molecules, like how many hydrogen bonds it has or how toxic it is. Importantly, the team ensured that CACTUS could run on consumer-grade hardware as well as on supercomputers and use open-source language models, making cheminformatics accessible to researchers with limited computational resources. 

“We enable the acceleration of science by democratizing access to computational chemistry and modeling tools,” said Kumar. “By embracing open-source models and tools, agents like CACTUS make scientific discovery more accessible.”

Though it’s based on an LLM, CACTUS doesn’t just rely on its own training data to answer questions. Instead, it acts as an ‘Agent’ that interfaces with existing computational chemistry tools the researchers have developed over the years to provide answers. CACTUS interprets questions given by the user, determines which tool would be best to answer the question, formats the question into the correct input for the tool to make the calculation, and then provides the tool’s output back to the user.

“LLMs by themselves don’t have the knowledge or reasoning skills to answer these kinds of questions correctly,” said Andrew McNaughton, first author of the research paper published in ACS Omega. “We developed very specific prompts that CACTUS uses to interpret questions behind the scenes and select the best tool to answer the question.”

Demonstrating the utility of CACTUS

In their published research, the PNNL team provided examples of how CACTUS can enhance both drug discovery and materials research. 

In drug discovery, CACTUS aids researchers by predicting molecular properties, assessing drug-likeness, and identifying potential off-target effects. This can accelerate the identification of promising compounds, reducing the time and cost associated with traditional drug discovery methods. By integrating the agent with automated experimentation platforms, CACTUS will be able to design and prioritize experiments, analyze results, and iteratively refine its hypotheses, which could lead to more efficient and targeted exploration of chemical space.

In materials science, CACTUS can facilitate the discovery of new materials with desired properties by exploring vast chemical spaces and predicting the performance of novel compounds. Integrating CACTUS with automated experimentation platforms will ultimately allow the agent to make data-driven decisions in real time, opening up new possibilities for autonomous discovery.

Rohith Varikoti, who co-authored the paper, recently demonstrated the utility of CACTUS at the Department of Energy booth at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC24). Though the agent is still evolving, its robust benchmarking shows that CACTUS has great potential to assist researchers in the design of molecules for specific applications in both chemistry and materials discovery.

“For our next step, we plan to connect CACTUS with additional tools to create a complete pipeline focused on small molecule discovery,” said Varikoti. 

Building upon the capabilities of CACTUS, the future of autonomous molecular discovery in drug development and materials science is poised for significant transformation. 

The integration of CACTUS with automated experimentation platforms enables real-time data-driven decisions, streamlining the discovery process and paving the way for fully autonomous laboratories,” said Carter Knutson, who also co-authored the paper.

This research was supported by the I3T investment under the Laboratory Directed Research and Development program at PNNL. The initial concept of integrating LLMs and tools received support from the Exascale Computing project under the Department of Energy, Office of Science, Advanced Scientific Computing Research program.