As supercomputers continue to grow to exa-scale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. In this paper, we propose a hardware-algorithm co-design of efficient and adaptive lossy compressor for scientific data on FPGAs (called CEAZ) to accelerate parallel I/O. Our contribution is fourfold: (1) We propose an efficient Huffman coding approach that can adaptively update Huffman codewords online based on codewords generated offline (from a variety of representative scientific datasets). (2) We derive a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode, enabling accurate offline Huffman codewords generation. This also help us create a fixed-ratio compression mode for consistent throughput. (3) We develop an efficient compression pipeline by adopting cuSZ’s dual-quantization algorithm to our hardware use case. (4) We evaluate CEAC on five real-world datasets with both a single FPGA board and 256 nodes from Bridges2 supercomputer. Experiments show that CEAZ outperforms the second-best FPGA-based lossy compressor by 2× of throughput and 9.6× of compression ratio. It also improves MPI_File_write and MPI_Gather throughputs by up to 32.7× and 31.4×, respectively.
Published: September 21, 2022
Citation
Zhang C., S. Jin, T. Geng, J. Tian, A. Li, and D. Tao. 2022.CEAZ: Accelerating Parallel I/O Via Hardware-Algorithm Co-Designed Adaptive Lossy Compression. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS 2022), June 28-30, 2022, Virtual, Online, Paper No.: 12. New York, New York:Association for Computing Machinery.PNNL-SA-161283.doi:10.1145/3524059.3532362