June 20, 2025
Journal Article
Extracting Material Property Measurements from Scientific Literature With Limited Annotations
Abstract
Extracting material property data from scientific text is pivotal for advancing data-driven research in chemistry and materials sci- ence. However, the extensive annotation effort required to produce training data for Named Entity Recognition (NER) models for this task often makes it a barrier to develop specialized data sets. In this work, we present a comparative study of the conventional, supervised NER methodology to alternative few-shot learning ar- chitectures and Generative Pre-trained Transformer (GPT) based approaches which mitigate the need to label large training datasets. We find that GPT not only excels in directly extracting relevant material properties based on limited examples but also can be used to enhance supervised learning through data augmentation. We supplement our findings with error and data quality assessment to provide a nuanced understanding of factors impacting property measurement extraction.Published: June 20, 2025