Data Intensive Scientific Computing (DISC)
The Data Intensive Scientific Computing (DISC) group creates breakthrough technologies to address many key data intensive computing problems in a range of science and engineering domains. This requires designing, building and deploying novel software frameworks and systems that can integrate and manage diverse data sets and computational tools in a scalable, flexible fashion. The technologies we develop comprise a range of software technologies, including advanced graphical user interfaces, distributed software integration and workflow frameworks, customizable knowledge management platforms for modeling and simulation and system-level tools high-performance data capture and processing. We apply these technologies in many scientific and engineering domains, including subsurface modeling, carbon sequestration, bioinformatics, climate modeling and the power grid.
DISC comprises 25 computer scientists, working on a wide range of projects for scientists and engineers both within PNNL and the external community. Some examples are:
Velo: A flexible, scalable knowledge management system designed to support the full lifecycle of modeling and simulation in any science domain and at any scale.
ASCEM: DISC leads the Platform Thrust in the DOE multi-lab Advanced Simulation Capability for Environmental Management project. This comprises and advanced modeling, analysis and data management platform, Akuna, which can be used to create numerical models of the subsurface to support environmental remediation.
GS3: The Geologic Sequestration Software Suite is built upon Velo to provide a full life-cycle modeling and simulation platform for carbon sequestration projects. GS3 is deployed and in use for several DOE projects involved in carbon sequestration modeling.
GCRM: The Global Cloud Resolving Model (GCRM) project provides data services for high resolution climate models with a particular emphasis on models discretized on the geodesic grid. We have developed a flexible I/O layer for the simulators which includes an I/O agent capable of identifying areas of special interest such as tropical cyclones writing very high temporal output of these areas, developed standards for representing the data in NetCDF formatted files, and developed fully data parallel analysis tools. More information and access to software can be found on our Wiki.
SALSSA: he Support Architecture for Large Scale Subsurface Analysis (SALSSA) project has developed technologies to support automated setup, job launching and monitoring of HPC simulations, and data and metadata management for the simulations. We also developed a SWIFT-based workflow to couple a hybrid subsurface model composed of a macro scale continuum code and a micro scale particle code. The workflow supports adaptive identification of regions of interest for the particle code as well as adaptive scheduling of the component codes on high performance computing architectures. They hybrid code enables modeling of physical phenomena that cannot be accomplished with either the micro or macro scale code individually.
ECCE: The Extensible Computational Chemistry Environment is an open source user environment for performing computational chemistry studies through an integrated suite of graphical user interface and scientific visualization applications built on a collaborative data management framework. For more information, download the software.
Future Power Grid: DISC leads the design and development of the GridOPTICS™ framework and several advanced analytical and control tools that we are building for the future power grid.
MeDICi Integration Framework (MIF): MIF is a high performance, scalable component-based integration platform for building computational pipelines that can coordinate analysis over distributed computing resources. MIF is designed to simplify the complexity of integrating heterogeneous codes and handling large data sets and employs mechanisms to ensure a low-friction end-to-end integration solution.