April 26, 2025
Conference Paper
Shifting Between Compute and Memory Bounds: A Compression-Enabled Roofline Model
Abstract
In the evolving landscape of high-performance computing, especially to fight the end of Moore’s Law and Dennard’s Scaling, the ability to shift between compute-bound and memory-bound states is critical for enhancing adaptability and flexibility to diverse system and domain-specific architectures. Such capability is vital for optimizing performance across distinguished hardware configurations, such as accelerators, memory hierarchies, and cache systems. Despite that ad hoc optimization techniques, such as compressed/approximate computation, have been enabled for compute-/data-intensive computing for improved performance in distinct hardware settings, there lacks an understanding of 1) the rational behind performance improvement; 2) capability of different optimizations; 3) what optimization to respond to specific computational and memory demands. This work proposes a compression-enabled roofline model to facilitate this adaptability with data compression techniques to balance and transform between computational and memory demands. This model enables applications to adjust in response to the specific strengths and limitations of the underlying hardware and system to optimize resource utilization. The effectiveness of this approach is demonstrated with matrix multiplication kernels on different input sizes, with turning on/off various compression techniques, including 1) low-precision floating point; 2) sparse matrix formulation; and 3) compressed arrays with ZFP. By reducing memory transfer volumes and cache misses and increasing data locality and computational intensity through compression, the specific roofline model can transform between compute and memory bounds to align more efficiently with system capabilities. This advancement not only improves overall performance but also maximizes adaptability in diverse computing environments.Published: April 26, 2025