Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

November 21, 2016

Conference Paper

Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

Abstract

As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) and its hybrid data store combining both of these data provenance perspectives.

Revised: February 15, 2017 | Published: November 21, 2016

Citation

Elsethagen T.O., E.G. Stephan, B. Raju, M. Schram, M.C. Macduff, D.J. Kerbyson, and K. Kleese-Van Dam, et al. 2016. Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications. In New York Scientific Data Summit (NYSDS 2016), August 14-17, 2016, New York. Piscataway, New Jersey:IEEE. PNNL-SA-119959. doi:10.1109/NYSDS.2016.7747819