An important feature of modern GPU architectures is variable occupancy. Occupancy measures the ratio between the number of active threads on a GPU and the maximum number of threads the GPU hardware can schedule. High occupancy allows a large number of threads to run simultaneously and hide memory latency, but may increase resource contention. Low occupancy has less resource contention, but is also less capable of latency hiding. Occupancy tuning is an important and challenge problem. A program running at different occupancy levels can have three to four times difference in running time. There has been limited exploration in GPU program occupancy tuning. We introduce Orion, the first GPU occupancy tuning framework.
Revised: January 15, 2020 |
Published: December 12, 2016
Citation
Hayes A., L. Li, D.G. Chavarria, S. Song, and E. Zhang. 2016.ORION: A Framework for GPU Occupancy Tuning. In Proceedings of the 17th Middleware Conference (Middleware 2016), December 12-16, 2016, Trento, Italy, 1-13; Article No. 18. New York, New York:ACM.PNNL-SA-120583.doi:10.1145/2988336.2988355