Graph Analytics

Graph Analytics

Data essential to scientific and national security problems often exist as networks or graphs that grow more complex as the amount of data grows. PNNL is pioneering graph analytics and network science to analyze these complex relationships and assist analysts through novel visualization and machine learning.

In the realm of high-performance graph analytics, PNNL has developed scalable graph platforms to understand network structures and extract actionable information embedded in heterogeneous data sets. One such development is Grappolo, a tool that detects meaningful data clusters among billions in under an hour on massively parallel processors. The tool runs on a variety of high-performance computing platforms including graphics processing units. Another platform, Scalable High-Performance Algorithms and Data Structures (SHAD), offers a high-level shared-memory programming environment and general-purpose data structure for implementing graph and associated algorithms and applications.

One challenge of graph analysis is understanding the relationship of the data and the sources from which it is sourced. In knowledge graphs, semantic relationships are represented as nodes and edges, allowing users to answer key questions and discover emerging trends via visualizations. StreamWorks** **combines streaming graph analytics and knowledge graphs to analyze multiple streaming data sources and identify emerging patterns of sophisticated cyber-attacks. The tool alerts analysts to pattern incidents and provides a description of and rationale for determining the potential threat. Similarly, in large, complex systems such as the power grid, data comes from multiple sources such as generating stations, transformers, and homes. PNNL researchers, with collaborators at the Northwest Institute for Advanced Computing, are tackling the challenge of gaining insight from this unstructured, multi-source graph data as part of the HAGGLE (Hybrid Attributed Generic Graph Library Environment) project.

While typical graphs model binary relationships, hypergraphs model multi-way relationships, exposing the interconnectedness of the data without artificially generating two-way relationships. HyperNetX(HNX) is an open-source Python library developed to analyze and visualize multi-way relationships modeled as hypergraphs. These relationships can be found in cyber data, protein pathways, bibliographic networks, and social media where interactions involve multiple entities simultaneously.

PNNL has a deep understanding of the mathematical principles governing the digital landscape and has made several theoretical and practical contributions to the field of multi-network analysis. Our graph analytics technologies have been deployed in domains that include threat detection for national security, cyber analytics, scientific computing, intellectual property portfolio analysis, energy grid reliability, environmental safety, training, and law enforcement.