Aqueous organic redox flow batteries offer an environmentally benign, tunable, and safe route to large-scale energy storage. The energy density is one of the key performance parameters of organic redox flow batteries, which critically depends on the solubility of the redox-active molecule in water. Prediction of aqueous solubility remains a challenge in chemistry. Recently, machine learning models have been developed for molecular properties prediction in chemistry and material science. The fidelity of a machine learning model critically depends on the diversity, accuracy, and abundancy of the training datasets. The existing solubility datasets only cover small chemical space of organic molecules and most often are in the low solubility regime. We build a comprehensive open access organic molecular database “Solubility of Organic Molecules in Aqueous Solution” (SOMAS) containing about 12,000 molecules that covers wider chemical and solubility regimes suitable for aqueous organic redox flow battery development efforts. Multiple independent molecular identifiers are provided to minimize the duplicate rate in the database. In addition to experimental solubility, we also provide six distinctive quantum descriptors including optimized geometry derived from high-throughput density functional theory calculations along with six molecular descriptors for each organic molecule. SOMAS builds a critical foundation for future efforts in artificial intelligence-based solubility prediction models.
Published: January 13, 2023
Citation
Gao P., A. Andersen, J.P. Sepulveda, G.U. Panapitiya, A.M. Hollas, E.G. Saldanha, and V. Murugesan, et al. 2022.SOMAS: a platform for data-driven material discovery in redox flow battery development.Scientific Data 9.PNNL-SA-161978.doi:10.1038/s41597-022-01814-4