November 28, 2024
Journal Article

CACTUS: Chemistry Agent Connecting Tool Usage to Science

Abstract

Large language models (LLMs) have shown remarkable potential in various domains but often lack the ability to access and reason over domain-specific knowledge and tools. In this article, we introduce Chemistry Agent Connecting Tool-Usage to Science (CACTUS), an LLM-based agent that integrates existing cheminformatics tools to enable accurate and advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama3-8b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b, Mistral-7b, and Llama3-8b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without a significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with widely used domain-specific tools provided by RDKit, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment.

Published: November 28, 2024

Citation

McNaughton A.D., G. Sankar Ramalaxmi, A. Kruel, C.R. Knutson, R.A. Varikoti, and N. Kumar. 2024. CACTUS: Chemistry Agent Connecting Tool Usage to Science. ACS Omega 9, no. 46:46563–46573. PNNL-SA-197968. doi:10.1021/acsomega.4c08408