In this work, we demonstrate that the highly-condensed BNN model can be shrunk significantly further by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, our out-of-order (OoO) architecture – O3BNN, can curtail remaining edge evaluation in cases where the binary output of a neuron can be determined early. Similar to Instruction-Level-Parallelism (ILP), these fine-grained, irregular, runtime pruning opportunities are traditionally presumed to be di cult to exploit. We evaluate our design on an FPGA platform using three well-known networks, including VggNet-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that our out-of-order approach can prune 27%, 16%, and 42% of the operations for the three networks respectively, without any accuracy loss, leading to at least 1.7×, 1.5×, and 2.1× speedups over state-of-the-art implementations on FPGA/GPU/CPU BNN implementations. Our approach is inference runtime pruning, so no retrain or fine-tuning is needed. We demonstrate our design on an FPGA platform. However, this is only for showcasing the method; the approach does not rely on any FPGA-specific features and can thus be adopted by other devices as well.
Revised: February 10, 2021 |
Published: August 14, 2019
Citation
Geng T., T. Wang, C. Wu, C. Yang, W. Wu, A. Li, and M. Herbordt. 2019.O3BNN: An Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning. In Proceedings of International Conference on Supercomputing, June 2019, Phoenix, AZ, 461-472. New York, New York:Association for Computing Machinery.PNNL-SA-141065.doi:10.1145/3330345.3330386