Data Scientist
Data Scientist

Biography

Anurag Acharya is a data scientist at Pacific Northwest National Laboratory, where his research focuses on the evaluation, benchmarking, and applied development of large language models (LLMs) for scientific, regulatory, and national security domains.

Acharya leads the Models and Benchmarks thrust and the Data thrust of PermitAI in its original and expansion scope respectively, which develops AI-driven capabilities to support environmental review under the National Environmental Policy Act and, more recently, state-level permitting for geothermal and critical mineral development. He serves as principal investigator on Apollo, an agentic scientific copilot for user facilities developed in collaboration with SLAC National Accelerator Laboratory and funded by the DOE Basic Energy Sciences program. He also contributes as AI lead on multiple efforts under the DOE Office of Environmental Management, including the development of LLM-based decision support tools for environmental remediation and AI components for conceptual site modeling.

Earlier work at PNNL includes leading the development of the first expert-curated evaluation benchmark for LLMs in the nuclear nonproliferation domain under EXPERT; training foundation models for molecular chemistry and developing vulnerability classifiers for cyber defense under MegaAI; and contributing to Accelerate and Theseus as an LLM researcher and advisor on scientific evaluation benchmarks, respectively.

Acharya holds a PhD in computer science from Florida International University, where he also taught natural language processing and conducted research on cultural knowledge in AI, narrative extraction, and AI bias. He advises PhD and undergraduate interns at PNNL and serves on review and program committees for NeurIPS, ACL, etc., and numerous workshops focused on responsible and applied AI.
 

Disciplines and Skills

  • AI
  • Large language models
  • AI evaluation and benchmarking
  • Natural language processing
  • Generative AI
  • Agentic AI systems
  • AI safety and trustworthiness
  • Computational linguistics
  • Computational social sciences

Education

  • PhD in computer science, Florida International University
  • MS in computer science, Florida International University
  • BEng in computer engineering, Tribhuvan University, Nepal
  • BA in English and political science, Tribhuvan University, Nepal

Affiliations and Professional Service

Professional Membership

  • Association for the Advancement of Artificial Intelligence
  • Association for Computational Linguistics
  • Association for Computing Machinery

Program and Organizing Committee

  • Organizing Committee, Social Development through NLP-driven Interdisciplinary Collaborations (SocioNLP) Workshop, 2024–2025
  • Ethics Reviewer, Conference on Neural Information Processing Systems (NeurIPS), 2023–present
  • Reviewer, Association for Computational Linguistics (ACL) Rolling Review, 2024–present
  • Program Committee, Workshop on Responsible Language Models (ReLM), 2024
  • Program Committee, The 43rd International Conference on Conceptual Modeling, 2024
  • Session Chair, Ninth Annual Conference on Advances in Cognitive Systems, 2021
  • Organizing Committee, Communicating Science Workshop for Graduate Students, 2021

Review Committee (Journals)

  • Natural Language Engineering, 2023–Present
  • IEEE Transactions on Artificial Intelligence, 2023–Present
  • Humanities & Social Sciences Communications, 2023–Present
  • International Journal of Data Science and Analytics, 2023–Present

Awards and Recognitions

  • Best Paper Award, Advanced Engineering and ICT-Convergence Proceedings, Transfer Learned Mobilenets with Shrinking Hyperparameters for Classifying Covid-19 Based on X-ray Images, 2021

Publications

2026

  • Acharya A., B. Lakha, R. Meyur, R. Nuttall, S. Chaturvedi, A. Halappanavar, L. Hare, L. Zeng, M. Parker, S. Munikoti, and Y.S. Horawalavithana. 2026. “DraftNEPABench: A Benchmark for Drafting NEPA Document Sections with Coding Agents.” ACM Conference on AI and Agentic Systems (CAIS 2026). PNNL-SA-220871. https://dl.acm.org/doi/10.1145/3786335.3813132
  • Somasekharan N., L. Yue, Y. Cao, W. Li, P. Emami, P.S. Bhargav, A. Acharya, X. Xie, and S. Pan. 2026. “CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics.” Journal of Data-centric Machine Learning Research Special Track. PNNL-SA-211417. 

2025

  • Chaturvedi S., A. Acharya, R. Meyur, K. Hayashi, S. Munikoti, and Y.S. Horawalavithana. 2025. “Evaluating the Robustness of Dense Retrievers in Interdisciplinary Domains.” KDD Workshop on Evaluation and Trustworthiness of Agentic and Generative AI Models, ACM KDD 2025. Toronto, Canada.
  • Meyur R., H. Phan, K. Hayashi, I. Stewart, S. Sharma, S. Chaturvedi, M. Parker, D. Nally, S. Montgomery, A. Jannesari, K. Pazdernik, M. Halappanavar, S. Munikoti, Y.S. Horawalavithana, and A. Acharya. 2025. “Benchmarking LLMs for Environmental Review and Permitting.” Workshop on Large Language Models for Scientific and Societal Advances, ACM KDD 2025. Toronto, Canada.
  • Meyur R., H. Phan, S. Wagle, J. Strube, M. Halappanavar, Y.S. Horawalavithana, A. Acharya, and S. Munikoti. 2025. “WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain.” Fourth Workshop on NLP for Positive Impact, The 63rd Annual Meeting of the Association for Computational Linguistics (ACL). Vienna, Austria.
  • Wagle S., S. Munikoti, R. Meyur, J.M. Whiting, H.K. Farr, A. Acharya, Y.S. Horawalavithana, M. Halappanavar, J. Strube, and L.M. Fierce. 2025. “Leveraging Multimodal AI for Efficient Data Discovery in Wind Energy Research.” ACM Practice and Experience in Advanced Research Computing Conference. Columbus, Ohio.
  • Stam C., E.G. Saldanha, M. Halappanavar, and A. Acharya. 2025. “Leveraging Language Modeling and Dynamic Social Network Analysis to Recognize Patterns in the Spread of COVID-19 Misinformation Narratives on Social Media.” SocioNLP Workshop, 18th ACM International Conference on Pervasive Technologies Related to Assistive Environments (PETRA). Corfu, Greece.
  • Saldanha E.G., A. Acharya, M. Ocal, M.F. Glenski, and S. Volkova. 2025. “Detecting and Summarizing Narratives in the Information Environment: A Case Study of Misinformation and Disinformation Campaigns.” In Detecting Online Propaganda and Misinformation, edited by Last, Litvak, and Lin. World Scientific.
  • Glenski M.F., R.J. Cosbey, S. Sharma, M. Subramanian, A. Acharya, and E.M. Ayton. 2025. Mega AI: Scaling AI for Science and Security. PNNL-37221. Richland, WA: Pacific Northwest National Laboratory. Mega AI: Scaling AI for Science and Security
  • Munikoti S., D.M. Nally, S.D. Koneru, S. Das, K. Bhattacharjee, A.C. Buchko, and T.C. Edwards, et al. 2025. NEPATEC v2.0: Standardized Metadata and Text Corpus of National Environmental Policy Act Documents. PNNL-38163. Richland, WA: Pacific Northwest National Laboratory. NEPATEC v2.0: Standardized Metadata and Text Corpus of National Environmental Policy Act Documents
  • Nally D.M., M.J. Parker, M. Aumeier, K. Murphy, M. Rau, J. McWalter, and J. Titus, et al. 2025. Workshop Summary Report on Using AI Tools to Improve the Efficiency and Outcomes of the NEPA Process: AI for Permitting Workshop at the 2025 National Association of Environmental Professionals (NAEP) Annual Conference. PNNL-37758. Richland, WA: Pacific Northwest National Laboratory. Workshop Summary Report on Using AI Tools to Improve the Efficiency and Outcomes of the NEPA Process: AI for Permitting Workshop at the 2025 National Association of Environmental Professionals (NAEP) Annual Conference
  • Saldanha E.G., A. Acharya, M. Ocal, J. Eshun, M.F. Glenski, and S. Volkova. 2025. "Detecting and Summarizing Narratives in the Information Environment: A Case Study of Misinformation and Disinformation Campaigns." In Detecting Online Propaganda and Misinformation, edited by Mark Last, Marina Litvak, Miao Lin. PNNL-SA-171527.  doi:10.1142/13556

2024

  • Munikoti S., A. Acharya, S.N. Wagle, and Y.S. Horawalavithana. 2024. "Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning." In Proceedings of the 4th Workshop on Scholarly Document Processing (SDP 2024), August 16, 2024, Bangkok, Thailand, edited by T. Ghosal, et al, 84-89. Kerrville, Texas:Association for Computational Linguistics. PNNL-SA-189029.
  • Yarlott W.H., A. Acharya, D. Castro Estrada, D. Gomez, and M.A. Finlayson. 2024. "GOLEM: GOld standard for Learning and Evaluation of Motifs." In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 20-24, 2024, Torino, Italy, edited by N. Calzolari, et al, 7801-7813. Kerrville, Texas:Association for Computational Linguistics. PNNL-SA-191514.
  • Acharya A., D. Castro Estrada, S. Dahal, W.H. Yarlott, D. Gomez, and M.A. Finlayson. 2024. "Discovering Implicit Associations of Cultural Motifs from Text." Sixth Workshop on NLP and Computational Social Science (NLP+CSS), 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico. PNNL-SA-193395.
  • Munikoti, S., A. Acharya, S. Wagle, and Y. S. Horawalavithana. 2024. “ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science.” Workshop on AI to Accelerate Science and Engineering, The Thirty-Eighth Annual AAAI Conference on Artificial Intelligence. Vancouver, Canada.
  • Wagle, S., S. Munikoti, A. Acharya, S. Smith, and Y. S. Horawalavithana. 2024. “Empirical evaluation of Uncertainty Quantification in Retrieval-Augmented Language Models for Science.” Workshop on Scientific Document Understanding, The Thirty-Eighth Annual AAAI Conference on Artificial Intelligence. Vancouver, Canada.

2021

  • Yarlott, W.V.H., A. Ochoa, A. Acharya, L. Bobrow, D. Castro-Estrada, D. Gomez, J. Zheng, D. McDonald, C. Miller, and M.A. Finlayson. 2021. “Finding Trolls Under Bridges: Preliminary Work on a Motif Detector.” Advances in Cognitive Systems. Virtual Conference
  • Yarlott, W.V.H., A. Ochoa, A. Acharya, L. Bobrow, D. Castro-Estrada, D. Gomez, J. Zheng, D. McDonald, C. Miller, and M.A. Finlayson. 2021. “AI models for detecting motifs in a text collection” Literature & Culture and/as Intelligent Systems. Stuttgart, Germany.
  • Acharya, A., K. Talamadupula, and M.A. Finlayson. 2021. “Towards an Atlas of Cultural Commonsense for Machine Reasoning.” Workshop on Common Sense Knowledge Graphs, The Thirty-Fifth AAAI Conference on Artificial Intelligence. Virtual Conference.
  • KC, K., A. Acharya, A. Acharya, and S. Shrestha. 2021. “Transfer Learned Mobilenets with shrinking hyperparameters for classifying Covid-19 based on X-ray images.” Advanced Engineering and ICT-Convergence Proceedings. Vol 4, No. 2. Bangkok, Thailand.v