November 17, 2019
Conference Paper

BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets

Abstract

Binarized neural network (or BNN) promises tremendous performance improvement over traditional DNNs through simplified bit-level computation and significantly reduced memory access/storage cost. In addition, it has other advantages of low-cost, low-energy and high-robustness, showing great utilization potential in resources-constrained, volatile and latency-critical applications, which are critical for future HPC execution. However, the promised significant performance gain of BNN inference has never been fully demonstrated on general-purpose processors, particularly on GPUs, due to: (i) the challenge to extract and leverage sufficient fine-grained bit-level-parallelism to saturate GPU cores when batch is small; (ii) the fundamental design conflict between bit-based BNN algorithm and underlying word-based architecture; (iii) architecture \& performance unfriendly BNN network design. To address (i) and (ii), we propose binarized-soft-tensor-core as a software-hardware codesign approach to construct bit-manipulation capability for modern GPUs to effectively harvest the emerging bit-level-parallelism. To tackle (iii), we propose intra- and inter-layer fusion techniques so that the entire BNN inference can be packed into a single GPU kernel, to avoid high-cost frequent launching. Experiments demonstrate that our design can achieve over 1000x speedup for raw inference latency and 10x for inference throughput over the state-of-the-art full-precision simulated BNN inference for AlexNet on ImageNet.

Revised: January 2, 2020 | Published: November 17, 2019

Citation

Li A., T. Geng, T. Wang, M. Herbordt, S. Song, and K.J. Barker. 2019. BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets. In International Conference for High Performance Computing, Networking, Storage, and Analysis, November 17-22, 2019, Denver, CO, Article No a38. Los Alamitos, California:IEEE Computer Society. PNNL-SA-142851. doi:10.1145/3295500.3356169