Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design

July 26, 2024

Conference Paper

Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design

Abstract

Discovering novel catalysts requires complex reasoning involving multiple chemical properties and resultant trade-offs, leading to a combinatorial growth in the search space. While large language models (LLM) have demonstrated novel capabilities for chemistry through complex instruction following capabilities and high quality reasoning, a goal-driven combinatorial search using LLMs has not been explored in detail. In this work, we present a Monte Carlo Tree Search-based approach that improves beyond state-of-the-art chain-of-thought prompting variants to augment scientific reasoning. We introduce two new reasoning datasets: 1) a curation of computational chemistry simulations, and 2) diverse questions written by catalysis researchers for reasoning about novel chemical conversion processes. We improve over the best baseline by 25.8\% and find that our approach can augment scientist's reasoning and discovery process with novel insights.\footnote{All resources will be publicly available upon publication.

Published: July 26, 2024

Citation

Sprueill H.W., C. Edwards, M.V. Olarte, U. Sanyal, H. Ji, and S. Choudhury. 2023. Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design. In Findings of the Association for Computational Linguistics (EMNLP2023), December 6-10, 2023, Singapore, edited by H. Bouamor, J. Pino, and K. Bali, 8348–8365. PNNL-SA-188070. doi:10.18653/v1/2023.findings-emnlp.560

Research topics

Catalysis

Artificial Intelligence

PNNL

Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design

Abstract

Citation

Research topics

DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models

A derecho climatology (2004-2021) in the United States based on machine learning identification of bow echoes

Materials Characterization, Prediction and Control Project: Summary Report on Data Analytics Framework