Efficient Extraction of Regional Subsets from Massive Climate Datasets using Parallel IO

September 16, 2010

Conference Paper

Efficient Extraction of Regional Subsets from Massive Climate Datasets using Parallel IO

Abstract

The size of datasets produced by current climate models is increasing rapidly to the scale of petabytes. To handle data at this scale parallel analysis tools are required, however the majority of climate analysis software remains at the scale of workstations. Further, many climate analysis tools adequately process regularly gridded data but lack sufficient features when handling unstructured grids. This paper presents a data-parallel subsetter capable of correctly handling unstructured grids while scaling to over 2000 cores. The approach is based on the partitioned global address space (PGAS) parallel programming model and one-sided communication. The paper demonstrates that IO remains the single greatest bottleneck for this domain of applications and that parallel analysis of climate data succeeds in practice.

Revised: December 27, 2010 | Published: September 16, 2010

Citation

Daily J.A., K.L. Schuchardt, and B.J. Palmer. 2010. Efficient Extraction of Regional Subsets from Massive Climate Datasets using Parallel IO. In American Geophysical Union, Fall Meeting 2010, Paper No. IN41A-1360. Washington Dc:American Geophysical Union. PNNL-SA-71307.