Skip to Main Content U.S. Department of Energy
Computing Research

Data Management and Integration

Data collected from systems—ranging from field sensors to commercial vendor data services—often are incomplete and require specialized software to be usable. PNNL has extensive experience augmenting metadata and applying semantic technologies and other processing techniques to provide additional context and meaning for data. Our experience spans local data access to highly distributed data collections, as well as traditional storage systems to high-performance warehousing technologies and cloud-based solutions.

Key Capabilities

  • Data Semantics and Provenance
  • Streaming Data Management
  • Data Engineering
  • Big Data Architectures for Next-Generation Data-Intensive Computing
  • Integration of Large-scale Data Sources
  • Systems Integration

Significant Projects

AIM Website

Analysis in Motion

The Analysis in Motion (AIM) Software Infrastructure (ASI) is a cloud-based software integration platform for studying human-centric streaming data analytics. ASI uses a modular, hierarchical architecture to support real-time streaming algorithms and user interfaces built with diverse design methods, such as extract-transform-load, model-view-controller, peer-to-peer, subscribe and publish, event loops, and microservices. The implementation is cross-platform, language-agnostic, and supports data provenance.

ARM Website

Atmospheric Radiation Measurement

PNNL’s Data Management and Integration team supports the ARM Climate Research Facility, a DOE national scientific user facility funded through the Office of Biological and Environmental Research. PNNL develops and supports a variety of software components to ensure reliable data products by engaging with instrument mentors, translators, and developers from the nine national laboratories that manage and operate the ARM Facility’s Data Services and Operations group.

ADI Website

ARM Data Integrator

The ARM Data Integrator (ADI) is a suite of tools, C libraries, structures, and interfaces developed to simplify the development of algorithms to analyze time-series data and decrease the costs associated with such development. The ADI architecture and functionality are designed to consolidate diverse time-series datasets into one or more new data products without the need to write any code.


Atmosphere to Electrons

The Atmosphere to Electrons (A2e) Data Archive and Portal (DAP) is an A2e Initiative Focus Area that provides state-of-the-science data services for the advancement of critical A2e research, communications, and knowledge discovery. The DAP’s objective is to provide secure, timely, easy, and open access to all laboratory, field, and benchmark model data produced by the A2e Initiative.

GOSS Website

GridOPTICS Software System

The GridOPTICSTM Software System, GOSS, is an open-source, vendor-independent middleware framework designed specifically to deploy new applications for the future power grid. This resource easily integrates grid applications with sources of data and facilitates straightforward communication between them, providing a foundation for developing a range of applications that will improve grid management.

HPDA Website

High Performance Data Analytics

Introduced in 2013 and led by PNNL, High Performance Data Analytics (HPDA) has been exploring, evaluating, and demonstrating the application of high-performance computing technologies to data analytics challenges. HPDA’s specific Focus Areas include: graph analytics, compute-intensive analytics, streaming analytics, exploratory data analysis, and emerging architectures analysis and distributed heterogeneous testbeds.

LightMAT DataHUB

LightMAT DataHUB

The DataHUB is a key capability within the Lightweight Materials Consortium ( (LightMAT) that provides secure, timely, easy, and open access to data produced by LightMAT consortia, a network that includes 10 DOE national laboratories with technical capabilities highly relevant to lightweight materials development. The LightMAT DataHUB collects, stores, catalogs, processes, preserves, and disseminates all significant LightMAT data. These data include codes, models, experimental, simulated data, and journal publications.


Pacifica is an open-source scientific data management platform for harvesting, validating, and distributing data and metadata. It is architected as a flexible set of interchangeable tools used to build custom scientific data management solutions that meet the diverse changing demands of research in different institutions. Pacifica currently is deployed within the Environmental Molecular Sciences Laboratory and is collecting and storing scientific data and metadata from 135 production instruments.

Velo Website


Velo is a highly customizable, reusable, domain-independent collaborative knowledge management system based on commercial-grade, open-source technologies for managing scientific work over the entirety of a project’s life cycle. Velo has been deployed operationally in several U.S. government agencies.



VOLTTRONTM is an integration platform for software and smart device operations and communications over the smart grid. VOLTTRON employs a message bus architecture that affords robust integration strategies and future extensions to incorporate new and evolving smart grid communication protocols.

Computing Research

Research Areas