Evaluating On-Node GPU Interconnects for Deep Learning Workloads

January 1, 2018

Conference Paper

Evaluating On-Node GPU Interconnects for Deep Learning Workloads

Abstract

Scaling deep learning workloads across multiple GPUs on a single node has become increasingly important in data analytics. A key question is how well a PCIe-based GPU interconnect can perform relative to a custom high-performance interconnect such as NVIDIA's NVLink. This paper evaluates two such on-node interconnects for eight NVIDIA Pascal P100 GPUs: (a) the NVIDIA DGX-1's NVLink 1.0 `hybrid cube mesh'; and (b) the Cirrascale GX8's two-level PCIe tree using dual SR3615 switch risers. To show the effects of a range of neural network workloads, we define a parameterized version of the popular ResNet. We define a workload intensity metric that characterizes the expected computation/communication ratio; we also locate AlexNet and GoogLeNet within that space. As expected, the DGX-1 typically has superior performance. However, when equalizing GPU SM frequencies, the GX8 is very competitive on all ResNet workloads. With 8 GPUs, the GX8 can outperform the DGX-1 on all-to-all reductions by 10% for medium-sized payloads; and in rare cases, the GX8 slightly outperforms on ResNet.

Revised: June 12, 2019 | Published: January 1, 2018

Citation

Tallent N.R., N.A. Gawande, C.M. Siegel, A. Vishnu, and A. Hoisie. 2018. Evaluating On-Node GPU Interconnects for Deep Learning Workloads. In Proceedings of the 8th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, (PMBS 2017), November 13, 2017, Denver, CO. Lecture Notes in Computer Science, edited by S. Hammond, S. Jarvis and S. Wright, 10724, 3-21. Cham:Springer Verlag. PNNL-SA-129849. doi:10.1007/978-3-319-72971-8_1