Recurrent Neural Networks (RNN) is widely applied to temporal sequence analysis, where real-time performance is usually in demand. However, RNN suffers a heavy computational workload as the model comes with a large weight matrix. To alleviate the pain, model compression (pruning) schemes have been proposed for RNN that pruning the redundant (near-zero) weight-values. On the one hand, the non-structured pruning methods achieve a considerable pruning rate while bringing the computational irregularity, which is un-friendly to parallel-hardware. On the other hand, the existing structured pruning methods consider the hardware parallelism; However, they suffer a poor pruning rate due to the restrict constraints on pruning structure. This paper presents CSB-RNN, an optimized full-stack RNN framework with the novel compressed structured block (CSB) technique. The CSB-pruned RNN model comes with both fine-granularity that benefits the pruning rate and regular structure that facilitates the hardware-parallelism. Further, we propose a novel hardware architecture for inferencing the CSB-pruned model. Different from conventional parallel hardware, this architecture solves the block-workload imbalance issue and achieves an over 95% hardware utilization. With the experiments on 10 RNN models in 5 application domains, the CSB-RNN realizes 7×-20× lossless compression and up to 50× acceptable lossy-compression, which is 2×-7× to the prior art. With the addition of the novel hardware, the compressed-RNN inference reaches a super real-time latency of 10-400µs with FPGA implementation.
Revised: February 11, 2021 |
Published: June 29, 2020
Citation
Shi R., P. Dong, T. Geng, Y. Ding, X. Ma, H. So, and M. Herbordt, et al. 2020.CSB-RNN: A Faster-Than-Realtime RNN Acceleration Framework with Compressed Structured Blocks. In Proceedings of the 34th International Conference on Supercomputing (ICS 2020), June 29-July 2, 2020, Barcelona, Spain, Article No. 24. New York, New York:Association for Computing Machinery.PNNL-SA-150973.doi:10.1145/3392717.3392749