Skip to Main Content U.S. Department of Energy
Data Sciences

As a leader in high-performance graph analytics, PNNL develops scalable graph methods and platforms to understand network structures and extract actionable information from the entities and relationships embedded in heterogeneous data sets. Applied research ensures the operation of critical infrastructures, optimizes high-performance computations, identifies anomalous behavior, discovers communities of interest, and determines patterns of activities in human enterprises. PNNL also has created first-of-a-kind scalable platforms for conventional computer systems that enable analysts to process web-scale datasets without expert knowledge of the target machine.

CAPABILITIES

High-Performance Graph Methods

An array of natural and artificial complex systems can be characterized by interconnected entities best analyzed by graph-based approaches, including social, biological, cyber, power, and sensor networks. PNNL delivers novel algorithms for coloring, matching, community detection, influence maximization, anomaly/event detection, and pattern matching that translates raw data to insights.



Combinatorial Scientific Computing

Combinatorial Scientific Computing (CSC) is an interdisciplinary research area that aims to identify critical problems in scientific computing that can be modeled and solved efficiently as combinatorial problems via design and development of appropriate combinatorial algorithms and their efficient implementation on modern parallel architectures. PNNL and its collaborators contribute to CSC with new algorithms, scalable implementations, and novel applications using several graph algorithms, including matching, coloring, clustering, network alignment, and influence maximization. ExaGraph, a DOE-ASCR Exascale Computing Project co-design center, also focuses on combinatorial algorithms.


Knowledge Representation and Reasoning

PNNL develops intelligent systems that can learn from raw data and answer complex queries. Our technical focus is twofold: 1) scalable learning of semantic graphs or continuous feature representation of graph models and 2) algorithm development for end-user problems, such as event detection, relation classification, or explaining observations via inferencing. This research impacts applications for natural language processing, cyber security, and health informatics. A key aspect of the work involves integrating graph algorithms and deep learning by building on frameworks, such as Apache Spark and TensorFlow.


Property Graphs

Property graphs represent heterogeneous networks with attributed vertices and edges. As many datasets are proprietary, constructing realistic property graphs with the same statistical properties and connectivity is critical for privacy preservation and benchmarking purposes. PNNL’s Property Graph Model (PGM) approach uses a label augmentation strategy to preserve the vertex label, edge connectivity distributions, and their correlation while also replicating the degree distribution. Ongoing research involves defining more effective representations of joint distributions and improved tuning of the label augmentation process.


Software Platforms

PNNL maintains several scalable, high-performance software platforms that support complex analytic workflows for varied applications.





System Platforms

Due to irregular memory access patterns and low computation-to-communication ratios of many graph methods, conventional parallel runtime systems provide poor support. PNNL provides scalable runtime systems (GMT and AM++) to execute graph algorithms at scale. These runtime systems exploit a more dynamic data- and task-parallel execution model than the conventional bulk synchronous execution models. These systems also provide a global address space, atomic memory operations, and aggressive message aggregation to tolerate data movement latency.

Computing Research

Research Areas

Collaborations

Opportunities

People

PNNL