AbstractThe outbreak of the global COVID-19 pandemic emphasizes the importance of collaborative drug discovery for high effectiveness; however, due to the stringent data regulation, data privacy becomes an imminent issue needing to be addressed to enable collaborative drug discovery. In addition to the data privacy issue, the efficiency of drug discovery is another key objective since infectious diseases spread exponentially and effectively conducting drug discovery could save lives. Advanced Artificial Intelligence (AI) techniques are promising to solve these problems: (1) Federated Learning (FL) is born to keep data privacy while learning data from distributed clients; (2) graph neural network (GNN) can extract structural properties of molecules whose underlying architecture is the connected atoms; and (3) generative adversarial network (GAN) can generate novel molecules while retaining the properties learned from the training data. In this work, we make the first attempt to build a holistic collaborative and privacy-preserving FL framework, namely FL- DISCO, which integrates GAN and GNN to generate molecular graphs. Experimental results demonstrate the effectiveness of FL- DISCO on: (1) IID data for ESOL and QM9, where FL-DISCO can generate highly novel compounds with high drug-likeliness, uniqueness and LogP scores compared to the baseline; (2) non- IID data for ESOL and QM9, where FL-DISCO generates 100% novel compounds with high validity and LogP scores compared to the baseline. We also demonstrate how different fractions of clients, generator and discriminator architectures affect our evaluation scores.
Published: March 18, 2022