August 19, 2018
Conference Paper

Using Rule-Based Models for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

Abstract

With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of integrating rule-based knowledge into the Chemception CNN model via transfer learning techniques. The resulting model, ChemNet, is a transferable and generalizable deep neural network for chemical property prediction that learns in a semi-supervised manner from large unlabeled chemical databases. When ChemNet is further fine-tuned on 3 smaller datasets to predict chemical properties that it was not originally trained on, we show that ChemNet exceeds the accuracy of existing Chemception CNN models and other contemporary DNN models that were trained using conventional supervised learning approaches. These results indicate that pre-training ChemNet on a large diverse chemical database while incorporating chemistry domain knowledge, enables the development of more generalizable deep neural networks for the prediction of novel chemical properties.

Revised: February 10, 2021 | Published: August 19, 2018

Citation

Goh G.B., C.M. Siegel, A. Vishnu, and N.O. Hodas. 2018. Using Rule-Based Models for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018), August, 2018, London, UK, 302-306. New York, New York:Association for Computing Machinery. PNNL-SA-132274. doi:10.1145/3219819.3219838