In this paper, we propose a framework called FPDeep, which uses a hybrid of model and layer paral- lelism to configure distributed reconfigurable clusters to train DNNs. This approach has numerous benefits. First, the design does not suffer from batch size growth. Second, novel workload and weight partitioning leads to balanced loads of both among nodes. And third, the entire system is fine-grained pipeline. This leads to high parallelism and utilization and also minimizes the time features need to be cached while waiting for back-propagation.
Revised: October 27, 2020 |
Published: August 1, 2020
Citation
Wang T., T. Geng, A. Li, X. Jin, and M. Herbordt. 2020.FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters.IEEE Transactions on Computers 68, no. 8:1143 - 1158.PNNL-SA-140455.doi:10.1109/TC.2020.3000118