This paper presents novel cache optimizations for massively parallel, throughput-oriented architectures like GPUs. Based on the reuse characteristics of GPU workloads, we propose a design that integrates such efficient locality filtering capability into the decoupled tag store of the existing L1 D-cache through simple and cost-effective hardware extensions.
Revised: July 13, 2015 |
Published: June 7, 2015
Citation
Li C., S. Song, H. Dai, A. Sidelnik, S. Hari, and H. Zhou. 2015.Locality-Driven Dynamic GPU Cache Bypassing. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS 2015), June 8-11, 2015, Newport Beach, California, 66-77. New York, New York:ACM.PNNL-SA-109271.doi:10.1145/2751205.2751237