April 3, 2025
Conference Paper

Unlocking Scholarly Insights: Leveraging Machine Learning Approaches for Citation Analysis and Intent Classification

Abstract

Publicly funded organizations, notably institutions like the Los Alamos National Laboratory (LANL), are deeply vested in acquiring robust productivity metrics to gauge the entirety of their research output. Motivated by the imperative to enhance institutional productivity assessment, this study investigates the utilization of Large Language Models (LLM) such as BERT-based models, as well as local LLaMa-30b-instruct and Mixtral-8x7b-instruct architectures for classifying type of URL referenced resources in academic papers such as software, dataset, as well as authorship intent. Challenges in discerning resource types from context are highlighted, along with the potential of BERT and LLMs to address these challenges. Through comprehensive analysis, this research unveils a notable surge in documents featuring URL citations, indicative of the escalating importance of digital resources in scholarly publications. Moreover, citations to datasets and software demonstrate consistent growth over time, underscoring their increasing significance. Our findings also reveal that LANL authors contribute substantially to accessible science, comprising about 10% of dataset and software mentions in LANL

Published: April 3, 2025

Citation

Balakireva L., and M. Klein. 2025. Unlocking Scholarly Insights: Leveraging Machine Learning Approaches for Citation Analysis and Intent Classification. In Proceedings of the 24th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2024), December 16-20, 2024, Hong Kong, China, 1-5; Paper No. 26. New York, New York:Association for Computing Machinery. PNNL-SA-202009. doi:10.1145/3677389.3702514