How much can we gain from Tensor Kernel Fusion on GPUs?

November 18, 2024

Journal Article

How much can we gain from Tensor Kernel Fusion on GPUs?

Abstract

Kernel fusion, which merges two or more consecutive kernels into a single large kernel, is an important and fundamental optimization technique for (GP)GPU applications, especially deep neural networks. Kernel fusion can effectively reduce the slow off-chip memory accesses by storing the intermediate results between two successive kernels into fast on-chip memory (e.g. shared memory), which can potentially improve the performance and reduce energy consumption. GPU kernels can usually be categorized into tensor operations and element operations. In the deep learning field, fusing a tensor operation kernel with a successive element operation kernel (e.g. fusing Convolution with ReLu) has been widely used to achieve better performance. Recently, fusing two tensor kernels has demonstrated benefits in some applications. However, fusing two tensor kernels is not trivial. The advantages and limitations are mysterious and there are several questions needed to be answered : 1) what are the benefits of tensor kernel fusion on GPGPUs 2) what are the limitations of tensor kernel fusion and why it has not been widely used. 3) what are the practical scenarios for using tensor kernel fusion. To answer these questions, we conduct both analytical and experimental studies of tensor kernel fusion on commonly-used Nvidia Tensor Core GPUs. Our experiments are based on the industry-level kernel library CUTLASS with extensions. Our study shows that the the size of shared memory is the major limiting factor of tensor kernel fusion. Furthermore, mixed precision tensor kernel fusion can benefit from compared to the uniform precision. For certain practical scenarios, tensor kernel fusion can achieve 1.3x - 2.0x speed-up compared to the baseline non-fusion implementation.

Published: November 18, 2024

Citation

Sun W., A. Li, S. Stuijk, and H. Corporaal. 2024. How much can we gain from Tensor Kernel Fusion on GPUs?. IEEE Access 12, no. _:126135 - 126144. PNNL-SA-186332. doi:10.1109/ACCESS.2024.3411473

Research topics

High-Performance Computing

PNNL

How much can we gain from Tensor Kernel Fusion on GPUs?

Abstract

Citation

Research topics

Benchmark Tracking System for Performance Monitoring

ChemComp: Compiling and Computing with Chemical Reaction Networks

HEC: Equivalence Verification Checking for Code Transformation via Equality Saturation