Global climate researchers rely upon many forms of sensor data and analytical methods to help profile subtle changes in climate conditions. The U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) program provides researchers with curated Value Added Products (VAPs) resulting from continuous sensor data streams, data fusion, and modeling. The ARM operations staff and software development teams (data producers) rely upon a number of techniques to ensure strict quality control (QC) and quality assurance (QA) standards are maintained. Climate researchers (data consumers) are highly interested in obtaining as much provenance (data quality, data pedigree) as possible to establish data trustworthiness. Currently all the provenance is not easily attainable or identifiable without significant efforts to extract and piece together information from configuration files, log files, codes, and status information from ARM databases. The need for a formalized approach to managing provenance became paramount with the planned addition of 120 new instruments, new data products, and data collection scaling to half a terabyte daily. Last year our research identified the need for a multi-tier provenance model to enable the data consumer easy access to the provenance for their data. This year we are leveraging the Open Provenance Model as a foundational construct that serves the needs of both the VAP producers and consumers, we are organizing the provenance in different tiers of granularity to model VAP lineage, causality at the component level within a VAP, and the causality for each time step as samples are being assembled within the VAP. This paper shares our implementation strategy and how the ARM operations staff and the climate research community can greatly benefit from this approach to more effectively assess and quantify VAP provenance.
Revised: February 10, 2012 |
Published: December 8, 2010
Citation
Stephan E.G., T.D. Halter, and B.D. Ermold. 2010.Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research. In In Provenance and Annotation of Data and Processes - Third International Provenance and Annotation Workshop, IPAW 2010, June 15-16, 2010, Troy, New York. Lecture Notes in Computer Science, edited by DL McGuinness, et al, 6378, 34-41. Berlin:Springer.PNNL-SA-71579.doi:10.1007/978-3-642-17819-1