Extracting Material Property Measurements from Scientific Literature With Limited Annotations

June 20, 2025

Journal Article

Extracting Material Property Measurements from Scientific Literature With Limited Annotations

Abstract

Extracting material property data from scientific text is pivotal for advancing data-driven research in chemistry and materials sci- ence. However, the extensive annotation effort required to produce training data for Named Entity Recognition (NER) models for this task often makes it a barrier to develop specialized data sets. In this work, we present a comparative study of the conventional, supervised NER methodology to alternative few-shot learning ar- chitectures and Generative Pre-trained Transformer (GPT) based approaches which mitigate the need to label large training datasets. We find that GPT not only excels in directly extracting relevant material properties based on limited examples but also can be used to enhance supervised learning through data augmentation. We supplement our findings with error and data quality assessment to provide a nuanced understanding of factors impacting property measurement extraction.

Published: June 20, 2025

Citation

Kong J., G.U. Panapitiya, and E.G. Saldanha. 2025. Extracting Material Property Measurements from Scientific Literature With Limited Annotations. Journal of Chemical Information and Modeling 65, no. 10:4906–4917. PNNL-SA-194296. doi:10.1021/acs.jcim.4c01352

Research topics

Precision Materials by Design

Materials Sciences

PNNL

Extracting Material Property Measurements from Scientific Literature With Limited Annotations

Abstract

Citation

Research topics

Synthesis and Characterization of Enhanced Conductivity Copper Composites - CRADA 644 [Abstract Only]

Defect Structure in Quantum-Cutting Yb3+-Doped CsPbCl3 Perovskites Probed by X-Ray Absorption and Atomic Pair Distribution Function Analysis

Non-Conductor-Contact Surface Wave Reflectometry for Cable Insulation Damage Detection