Skip to Main Content U.S. Department of Energy
small banner

Anuj Shah

Anuj R. Shah
Anuj Shah


Dr. Shah is a Senior Research Scientist in the Scientific Data Management group at the Pacific Northwest National Laboratory (PNNL). His expertise is in the field of bioinformatics, heterogeneous data integration, machine learning, high performance computing, software architectures and algorithm development. He is the principal investigator on a PNNL Laboratory Directed Research and Development project under the Data Intensive Computing Initiative on devising high performance data analysis pipelines for streaming data. His current work includes deisotoping algorithms for proteomics data, application of machine learning frameworks to predict various properties of peptides identified using Mass Spectrometry and proteomics. He has contributed significantly to several publicly available software tools and algorithms, including the Bioinformatics Resource Manager (BRM), Data Fusion, SVM-Hustle and STEPP.

Dr. Shah has more than six years of experience working in interdisciplinary teams facilitating biological research and has authored and co-authored more than 15 peer-reviewed journal and conference papers.


  • Ph.D., Computer Science and Bioinformatics, Washington State University, May 2008
  • M.S., Computer Science and Application, Virginia Polytechnic Institute and State University, May 2003
  • B.E., Computer Engineering, University of Mumbai, India, June 2001

Honors and Awards

  • Finalist, Analytics Challenge, Supercomputing 2006
  • First Prize, IN-SPIRE InfoVis 2004 Contest Entry
  • Graduate Teaching Assistantship, Virginia Tech, 2002-2003

Selected Publications

Shah AR, M Singhal, TD Gibson, C Sivaramakrishnan, KM Waters, and I Gorton. 2008. "An Extensible, Scalable Architecture for Managing Bioinformatics Data and Analyses," escience, pp. 190-197, 2008 Fourth IEEE International Conference on eScience.

Webb-Robertson BM, WR Cannon, CS Oehmen, AR Shah, V Gurumoorthi, MS Lipton, and KM Waters. 2008. “A Support Vector Machine model for the prediction of proteotypic peptides for accurate mass and time proteomics,” Bioinformatics. 2008, May 3.

Shah AR, CS Oehmen, and BM Webb-Robertson. 2008. "SVM-Hustle – An iterative semi-supervised machine learning approach for protein remote homology detection" Bioinformatics. 2008 Mar 15;24 (6):783-90.

Shah AR, VM Markowitz, CS Oehmen. 2007. “High throughput computation of pairwise sequence similarities for multiple genome comparisons using ScalaBLAST.” Life Sciences Systems and Applications Workshop, 2007, Nov 8-9, 89-91.

Shah AR, S Mudita, KR Klicker, EG Stephan, SH Wiley, and KM Waters. 2007. “Enabling high-throughput data management for systems biology: The Bioinformatics Resource Manager.” Bioinformatics, 2007. 23(7) 906-909.

Contact Information

Systems Biology at PNNL

Research & Capabilities