February 15, 2024
Journal Article

Evaluating Uncertainty-Based Active Learning for Accelerating the Generalization of Molecular Property Prediction

Abstract

Deep learning models have proven to be a powerful tool for the prediction of molecular properties for applications including drug design and the development of energy storage materials. However, in order to learn accurate and robust structure-property mappings, these models require large amounts of data which can be a challenge to collect given the time and resource-intensive nature of experimental material characterization efforts. Additionally, such models fail to generalize to new types of molecular structures that were not included in the model training data. The acceleration of material development through uncertainty-guided experimental design has the promise to significantly reduce the data requirements and enable faster generalization to new types of materials. To evaluate the potential of such approaches for electrolyte design applications, we perform comprehensive evaluation of existing uncertainty quantification methods on the prediction of two relevant molecular properties - aqueous solubility and redox potential. We develop novel evaluation methods to probe the utility of the uncertainty estimates for both in-domain and out-of-domain data sets. Finally, we leverage selected uncertainty estimation methods for active learning to evaluate their capacity to support experimental design.

Published: February 15, 2024

Citation

Yin T., G.U. Panapitiya, E.D. Coda, and E.G. Saldanha. 2023. Evaluating Uncertainty-Based Active Learning for Accelerating the Generalization of Molecular Property Prediction. Journal of Cheminformatics 15. PNNL-SA-179045. doi:10.1186/s13321-023-00753-5