October 17, 2019
Web Feature

Sharing the Load Speeds Machine Learning

Balancing workload quickly finds meaning hidden in graph data

Machine Learning for Graphs

Machine Learning for Graphs

Joshua Sortino

In today’s digital age, the rabbit hole of connected information can be not only a time sink, but downright overwhelming. Even for high-performance computers.

With so much data linked to so much other data, many organizations capture and convey related information through graphs. But when grouped together for analyses, graphs can pose another problem. Currently, computers struggle to efficiently process and analyze the patterns hidden behind the bulk. When speed is of the essence, this deficiency becomes a frustrating bottleneck.

A team of researchers led by PNNL thinks they may have a solution. The team’s idea builds on an emerging machine learning approach called graph convolutional networks, or GCN, that infer patterns from graph data.

The researchers designed hardware accelerators that can automatically estimate and adjust—or auto-tune—workload duties among thousands of parallel processing units. After converging on the ideal workload balance, that configuration runs on subsequent processing iterations to quickly arrive at a solution.

The authors believe their workload-balancing approach is the first hardware accelerator design targeted to emerging GCNs. The team also believes their work is the first to auto-tune the workload balance through hardware rather than software for accelerated machine learning applications.

The new GCN technique could improve the speed of data discovery for safety and security, such as locating faults in power distribution systems, detecting risks in accounting systems, and autonomous motion tracking. The approach could also accelerate scientific discoveries in areas such as predicting chemical reactions and material properties, classifying particles in high energy physics, and modeling the side effects of medications.

The U.S. Department of Energy’s Office of Advanced Scientific Computing Research supported the research through PNNL’s Center for Advanced Technology Evaluation. PNNL also funded the research through its High-Performance Data Analytics Program and Data-Model Convergence initiative under the Laboratory Directed Research and Development Program.

Clearing Up a Convoluted Situation

Tong Geng, a doctoral student from the Computer Architecture and Automated Design Lab of Boston University, conceived and led the GCN research during his second stint as a PNNL intern. The idea began forming a year ago while working on a deep-learning project on binary neural networks during his first internship at PNNL.

That experience got him thinking about the limitations of a different type of machine learning—convolutional neural networks. These networks extract information from regular, dense, one-dimensional data sources such as photos, movies, audio clips, or text.

Conversely, GCNs infer relationships in sporadic, sparse, and two-dimensional graph data from sources such as social networks, organic chemical formulas, and electric grid network components. Because of the wide-ranging GCN data, inferring results takes a long time when traditional parallel processing approaches are used. But until now, efforts to accelerate machine-learning techniques have focused only on traditional neural networks.

Based on his first-year internship, Geng said he realized that “GCNs would become the next hot topic in machine learning” and proposed the new project with a goal of significantly accelerating the GCN inference rate by reducing processing delays and energy use.

How? Near ideal workload balance.

Big Data, No Waiting

Ang Li, a computer scientist in PNNL’s High-Performance Computing Division, advised Geng during his first and second internships. He explained that with ideal workload balance, the work is quickly and evenly distributed among processors that are constantly running independently in parallel; that is, they are not waiting on each other. With no idle time, the processors finish their workload simultaneously.

“When the processors reach nearly 100 percent utilization, that is near ideal workload distribution,” said Li.

The team tested their approach using massive graph data from four large citation databases and a social network. The ultra-workload-balanced design compiled statistical results up to nearly 300 times faster than standard parallel processors that ran without the rebalancing technique. The process also consumed 4000 times less energy.

Because of its efficiency, Geng thinks the workload-balancing approach could replace other convolution network techniques. In the meantime, Li and his colleagues plan to build a cluster of GCN accelerators for processing even larger graphs.

Reference: UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing. Tong Geng, Ang Li, Tianqi Wang, Chushu Wu, Yanfei Li, Antonino Tumeo, and Martin Herbordt. arXiv: https://arxiv.org/abs/1908.10834 

Published: October 17, 2019

Tong Geng
Ang Li
Tianqi Wang
Chushu Wu
Yanfei Li
Antonino Tumeo
Martin Herbordt