April 3, 2025
Conference Paper
Unlocking Scholarly Insights: Leveraging Machine Learning Approaches for Citation Analysis and Intent Classification
Abstract
Publicly funded organizations, notably institutions like the Los Alamos National Laboratory (LANL), are deeply vested in acquiring robust productivity metrics to gauge the entirety of their research output. Motivated by the imperative to enhance institutional productivity assessment, this study investigates the utilization of Large Language Models (LLM) such as BERT-based models, as well as local LLaMa-30b-instruct and Mixtral-8x7b-instruct architectures for classifying type of URL referenced resources in academic papers such as software, dataset, as well as authorship intent. Challenges in discerning resource types from context are highlighted, along with the potential of BERT and LLMs to address these challenges. Through comprehensive analysis, this research unveils a notable surge in documents featuring URL citations, indicative of the escalating importance of digital resources in scholarly publications. Moreover, citations to datasets and software demonstrate consistent growth over time, underscoring their increasing significance. Our findings also reveal that LANL authors contribute substantially to accessible science, comprising about 10% of dataset and software mentions in LANLPublished: April 3, 2025