Advanced Comput, Math & Data
New Workflow Approach Enables Development of Flexible Algorithms
Solution could greatly simplify the construction of workflow applications
Combining MeDICi with the Kepler workflow enables researchers to focus on scientific discovery. Enlarge Image.
Results: Scientists at Pacific Northwest National Laboratory have demonstrated a proof-of-concept workflow that enables scientists to focus on algorithm development and scientific discovery. This prototype was successfully demonstrated for an atmospheric sciences application.
Why it matters: Scientists rely on workflow applications to conduct their research. These applications are complex, involving many steps and considerable heterogeneity in software used and execution platforms. Coupled with that is the rapid growth of data sets that must be processed, requiring workflows to be regularly modified to scale to new data and processing challenges. The success of PNNL's workflow prototype means less time from concept to implementation, and provides improved opportunities for sharing research results.
Methods: The prototype combines PNNL's Middleware for Data Intensive Computing—called MeDICi—with the Kepler workflow environment, developed by DOE's Scientific Discovery through Advanced Computing Scientific Data Management Center.
The workflow was implemented for a complex application in DOE's Atmospheric Radiation Measurement (ARM) Program. It includes a framework that allows scientists to visually create and modify an individual pipeline for processing data as well as support for combining scientific algorithms, tools, and libraries.
The application also shows how MeDICi's broad integration capabilities complement the Kepler workflow tools. The result promotes a strong separation of concerns, simplifying the Kepler workflow description and promoting the creation of a reusable collection of components available for other workflow applications in this domain.
What's next: Researchers will further investigate workflow solutions built using Kepler and MeDICi. The ARM Program is rich with workflow applications (VAPs). A solution which significantly promotes modifiability and reusability and provides a platform for scalability in terms of processing and data size is attractive. Researchers also will benchmark the solutions and extend the designs to automate the executions of inter-related VAPs that run on distributed compute resources across North America.
Acknowledgment: Kepler Open Source Community
Sponsors: DOE SciDAC Scientific Data Management Center; PNNL Data Intensive Computing Initiative; DOE Atmospheric Radiation Measurement Program