Scientific applications are often structured as workflows that execute a series of distributed software modules to analyze large data sets. Such workflows are typically constructed using general-purpose scripting languages to coordinate the execution of the various modules and to exchange data sets between them. While such scripts provide a cost-effective approach for simple workflows, as the workflow structure becomes complex and evolves, the scripts quickly become complex and difficult to modify. This makes them a major barrier to easily and quickly deploying new algorithms and exploiting new, scalable hardware platforms. In this paper, we describe the MeDICi Workflow technology that is specifically designed to reduce the complexity of workflow application development, and to efficiently handle data intensive workflow applications. MeDICi integrates standard component-based and service-based technologies, and employs an efficient integration mechanism to ensure large data sets can be efficiently processed. We illustrate the use of MeDICi with a climate data processing example that we have built, and describe some of the new features
Revised: April 14, 2011 |
Published: June 1, 2009
Citation
Gorton I., J.M. Chase, A.S. Wynne, J.P. Almquist, and A.R. Chappell. 2009.Services + Components = Data Intensive Scientific Workflow Applications with MeDICi. In Component Based Software Engineering: 12th International Symposium (CBSE 2009), June 24-26, 2009, East Stroudsburg, PA. Lecture Notes in Computer Science, edited by GA Lewis, I Poernomo and C Hofmeister, 5582, 227-241. Berlin:Springer-Verlag.PNNL-SA-65181.doi:10.1007/978-3-642-02414-6_14