September 1, 2008
Conference Paper

Bringing high performance computing to the biologist’s workbench: approaches, applications and challenges

Abstract

Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH’s National Center for Biotechnology Information (NCBI), The Institute for Genomic Research (TIGR), and the DOE’s Joint Genome Institute (JGI) Integrated Microbial Genome (IMG) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets—enabling end users to rapidly access a growing public data source, as well as utilize analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines—from small clusters to emerging petascale machines—can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats.

Revised: September 30, 2008 | Published: September 1, 2008

Citation

Oehmen C.S., and W.R. Cannon. 2008. Bringing high performance computing to the biologist’s workbench: approaches, applications and challenges. In SciDAC 2008: Journal of Physics: Conference Series, 125, 012052. Bristol:IOP Publishing Ltd. PNNL-SA-61076. doi:10.1088/1742-6596/125/1/012052