December 30, 2025
Conference Paper
Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations
Abstract
In this work, we develop several machine learning (ML) based strategies to predict resources (costs) required for massively parallel chemistry computations, such as coupled-cluster methods, to guide application users before they commit to running expensive experiments on a supercomputer. By predicting application execution time, we determine the optimal runtime parameter values such as number of nodes and tile sizes. Using these optimal values, we strive to answer two key questions that help users to make informed decisions. The first is the shortest-time question where the user is interested in knowing the parameter configurations unlocking the shortest time to a solution given a problem size and a target supercomputer. The second is the budget question in which the user is interested in understanding the problem size and the parameter values that respect their budget (in terms of walltime, node-hours, energy consumption etc). Our work offers a thorough evaluation with a rich family of ML models and strategies developed based on the collections of the runtime parameter values of Coupled Cluster with Singles and Doubles (CCSD) executed on the Frontier, Aurora and Perlmutter supercomputers. Experiments show that when predicting the total execution time of CCSD, our Gradient Boosting model achieves a MAPE of 0.023 and 0.073 for Aurora and Frontier, respectively. In the case where it is expensive to run experiments just to collect data points, we employ active and generative learning. Our model achieves a MAPE of about 0.2 with just around 450 experiments collected from Aurora and Frontier when employing active learning.Published: December 30, 2025