Data Scientist
Data Scientist


Shivam Sharma is a data scientist in Pacific Northwest National Laboratory’s Physical and Computational Science directorate. He conducts research in the fields of artificial intelligence (AI) and natural language processing, with a focus on development and evaluation of large language models and their applications in the fields of scientific domains, as well as implementation of generative AI for the development of conversational AI assistants for multiple projects and databases (e.g., Livewire, NEPA documents, etc.). He received his MS in data science from the New Jersey Institute of Technology in 2021.

Research Interest

  • Artificial intelligence
  • Natural language processing
  • Large language models
  • Uncertainty quantification
  • Generative models
  • Computational linguistics


  • MS in data science, New Jersey Institute of Technology
  • BTech in computer and communication engineering, LNM Institute of Information Technology



  • Horawalavithana, Y. S., E. M. Ayton, S. Sharma, S. A. Howland, M. Subramanian, S. W. Vasquez, and R. J. Cosbey, et al. 2022. "Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned." In Proceedings of BigScience Episode #5 — Workshop on Challenges & Perspectives in Creating Large Language Models, May 2022, Virtual and Dublin, Ireland, 160–172. Stroudsburg, Pennsylvania: Association for Computational Linguistics. PNNL-SA-171279. doi:10.18653/v1/2022.bigscience-1.12
  • Horawalavithana, S., E. Ayton, A. Usenko, S. Sharma, J. Eshun, R. Cosbey, M. Glenski, and S. Volkova. 2022. "EXPERT: Public Benchmarks for Dynamic Heterogeneous Academic Graphs." arXiv preprint: arXiv:2204.07203.


  • Duskin, K. R., S. Sharma, J. Yun, E. G. Saldanha, and D. L. Arendt. 2021. "Evaluating and Explaining Natural Language Generation with GenX." In Workshop on Data Science with Human-in-the-loop: Language Advances (DaSH-LA) colocated with NAACL 2021, June 11, 2021, Virtual, Online, edited by E. Dragut, et al., 70–78. Stroudsburg, Pennsylvania: Association for Computational Linguistics. PNNL-SA-159018. doi:10.18653/v1/2021.dash-1.12
  • Sharma, S., and C. Buntain. 2021. "An Evaluation of Twitter Datasets from Non-Pandemic Crises Applied to Regional COVID-19 Contexts." 18th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2021, 808–815.


  • Sharma, S., and C. Buntain. 2020. “Improving Classification of Crisis-Related Social Media Content via Text Augmentation and Image Analysis.” Text Retrieval Conference 2020.


  • Bhatt, G., A. Sharma, S. Sharma, A. Nagpal, B. Raman, and A. Mittal. 2018. "Combining Neural, Statistical and External Features for Fake News Stance Identification." Companion Proceedings of the The Web Conference 2018, April 2018, 1353–1357.  doi:10.1145/3184558.3191577