Stencil computation is a classic computational kernel present in many high-performance scientific applications, like image process- ing and partial differential equation solvers (PDE). A stencil compu- tation sweeps over a multi-dimensional grid and repeatedly updates values associated with points using the values from neighboring points. Stencil computations often employ large datasets that ex- ceed cache capacity, leading to excessive accesses to the memory subsystem. As such, 3D stencil computations on large grid sizes are memory-bound.
In this paper we present PIMS, an in-memory accelerator for stencil computations. PIMS, implemented in the logic layer of a 3D- stacked memory, exploits the high bandwidth provided by through- silicon vias to reduce redundant memory traffic. Our comprehensive evaluation using three different grid sizes with six categories of orders indicate that the proposed architecture reduces 48.25% of data movement on average and obtains up to 65.55% of bank conflict reduction.
Revised: December 18, 2019 |
Published: November 1, 2019
Citation
Li J., W. Xi, A. Tumeo, B. Williams, J.D. Leidel, and Y. Chen. 2019.PIMS: A Lightweight Processing-in-Memory Accelerator for Stencil Computations. In The International Symposium on Memory Systems (MEMSYS 2019), September 30-October 3, 2019, Washington DC, 41-52. New York, New York:ACM.PNNL-SA-146976.doi:10.1145/3357526.3357550