June 20, 2025
Journal Article

Extracting Material Property Measurements from Scientific Literature With Limited Annotations

Abstract

Extracting material property data from scientific text is pivotal for advancing data-driven research in chemistry and materials sci- ence. However, the extensive annotation effort required to produce training data for Named Entity Recognition (NER) models for this task often makes it a barrier to develop specialized data sets. In this work, we present a comparative study of the conventional, supervised NER methodology to alternative few-shot learning ar- chitectures and Generative Pre-trained Transformer (GPT) based approaches which mitigate the need to label large training datasets. We find that GPT not only excels in directly extracting relevant material properties based on limited examples but also can be used to enhance supervised learning through data augmentation. We supplement our findings with error and data quality assessment to provide a nuanced understanding of factors impacting property measurement extraction.

Published: June 20, 2025

Citation

Kong J., G.U. Panapitiya, and E.G. Saldanha. 2025. Extracting Material Property Measurements from Scientific Literature With Limited Annotations. Journal of Chemical Information and Modeling 65, no. 10:4906–4917. PNNL-SA-194296. doi:10.1021/acs.jcim.4c01352