July 1, 2020
Journal Article

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Abstract

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors — including NVIDIA, Intel, AMD, and IBM — have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL work- loads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and se- quential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling — sometimes encouraged by restricted GPU memory — NVLink is less important.

Revised: May 7, 2020 | Published: July 1, 2020

Citation

Gawande N.A., J.A. Daily, C. Siegel, N.R. Tallent, and A. Vishnu. 2020. Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. Future Generation Computer Systems 108. PNNL-SA-134513. doi:10.1016/j.future.2018.04.073