AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing
The recent development of deep learning has been mostly focusing on Euclidean data, such as images, videos, audios, etc. However, most real-world information and relation are often expressed as graphs. To efficiently learn from graph data, graph convolutional networks (GCNs) emerge as a promising approach, showing advantages in several practical applications such as social network analysis, knowledge discovery, 3D modeling, motion capturing, etc. Real-world graphs are usually extremely large and imbalanced, posting significant performance demand and design challenges on the hardware dedicated for GCN inference. In this paper, we propose an architecture design called UW-GCN to accelerate graph convolutional network inference. To tackle the major performance bottleneck from workload imbalance, we propose dynamic neighborhood stealing and remote chunk shuffling techniques, relying on hardware flexibility to achieve hardware auto-tuning under negligible area or delay overhead. Specifically, UW-GCN is able to smartly profile the sparse graph pattern while continuously adjusting the workload distribution via routing reconfiguration among parallel processing elements (PEs). The ideal configuration is then reused in the remaining iterations. To the best of our knowledge, this is the first accelerator design particularly for GCN and the first work relying on hardware auto-tuning, which is normally based on software, to achieve near-optimal workload balance in processing sparse structures.
Revised: January 5, 2021 |
Published: October 17, 2020