November 1, 2008
Journal Article

It Takes Glue to Tango: MeDICi integration framework creates data-intensive computing pipeline

Abstract

Biologists increasingly rely on high-performance computing (HPC) platforms to rapidly process the tsunami of data generated by high throughput genome and metagenome sequencing technology and high-throughput proteomics. Unfortunately, the platforms that produce the massive data sets rarely work smoothly with the interactive analysis and visualization programs used in bioinformatics. This makes it difficult for researchers to exploit the computational power of HPC platforms to speed scientific discovery. At the Department of Energy’s Pacific Northwest National Laboratory in Richland, Wash., researchers are creating computing environments for biologists that seamlessly integrate collections of data and computational resources. These advantages enable users to rapidly analyze high-throughput data. A major goal is to shield the biologist from the complexity of interacting with multiple dissimilar databases and running tasks on HPC platforms and computational clusters. One of those environments the MeDICi Integration Framework is now available for free download. Short for Middleware for Data-Intensive Computing, MeDICi makes it easy to integrate separate codes into complex applications that operate as a data analysis pipeline.

Revised: June 10, 2010 | Published: November 1, 2008

Citation

Gorton I., C.S. Oehmen, and J.E. McDermott. 2008. It Takes Glue to Tango: MeDICi integration framework creates data-intensive computing pipeline. Scientific Computing 25, no. 7:16-24. PNNL-SA-62541.