January 1, 2021
Journal Article

O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference

Abstract

Binarized Neural Networks (BNN) have drawn tremendous attention due to significantly reduced computational complexity and memory demand. They have especially shown great potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching a certain accuracy bar is often sufficient, and real-time is highly desired.In this work, we demonstrate that the highly-condensed BNN model can be shrunk significantly further by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture – O3BNN-R, can curtail edge evaluation in cases where the binary output of a neuron can be determined early. Similar to Instruction-Level-Parallelism(ILP), these fine-grained, irregular, runtime pruning opportunities are traditionally presumed to be difficult to exploit. In order to increase the pruning opportunities, we also optimize the training process by adding 2 regularization items in the loss function (1) for pooling pruning and (2) for threshold pruning. We evaluate our design on an FPGA platform using three well-known networks, including VggNet-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10.

Revised: October 8, 2020 | Published: January 1, 2021

Citation

Geng T., A. Li, T. Wang, C. Wu, Y. Li, R. Shi, and W. Wu, et al. 2021. O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference. IEEE Transactions on Parallel and Distributed Systems 32, no. 1:199-213. PNNL-SA-148318. doi:10.1109/TPDS.2020.3013637