Skip to Main Content U.S. Department of Energy
Science Directorate
Page 56 of 278

Advanced Computing, Mathematics and Data
Staff Awards & Honors

December 2016

Changing the Game

Song's work makes the cut in two major high-performance computing conferences

Shuaiwen Leon Song

Congratulations to Shuaiwen Leon Song, a research scientist with PNNL’s High Performance Computing group, who recently had two papers accepted at high-level conferences that focus on diverse and leading-edge research related to high-performance computer architectures.

Initially, the paper, “Processing-in-Memory Enabled Graphics Processors for 3D Rendering,” was accepted for presentation at the 23rd IEEE International Symposium on High-Performance Computer Architecture, or HPCA-23, to be held next February in Austin. HPCA-23 is a leading forum for scientists and engineers to present their diverse work involving computer architectures. Only 50 papers were selected for this year’s main session presentations.


The work, coauthored with scientists from the University of Houston and Beijing Advanced Innovation Center for Imaging Technology, examines the three-dimensional (3D) gaming experience on modern computer systems that provides today’s gamers with immersive graphics and intense imaging at the cost of memory bandwidth. Song and his coauthors employed two architectural designs to enable processing-in-memory-based graphics processing units (GPU) for efficient 3D rendering. The team used well-known games, including Doom 3 and Half-Life 2, to demonstrate their designs could improve texture filtering performance and 3D rendering up to 6.4 times and 65 percent over baseline GPUs. Their design also was shown to provide significant memory traffic and energy reduction without sacrificing rendering quality.

“As part of this work, we explored the Hybrid Memory Cube, or HMC, a type of stacked memory in the GPU, to efficiently process high-performance graphics applications and alter their pipeline to significantly reduce memory access,” Song explained. “We also enabled an approximate computing strategy with the new design.”

In April 2017, Song will present his work, “Locality-Aware CTA Clustering For Modern GPUs,” coauthored with Ang Li, a Ph.D. intern also with PNNL’s High Performance Computing group, at the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, known as ASPLOS 2017, being held in China. ASPLOS showcases “groundbreaking research at the intersection of at least two disciplines: architecture, programming languages, operating systems, and related areas,” making this acceptance especially noteworthy for Song.

Ang Li, a Ph.D. intern at PNNL, co-authored the paper accepted by ASPLOS.

“This year, the conference only accepted 56 out of 321 papers submitted, which is a 17.4 percent acceptance rate,” he noted. “Both the number of papers submitted and accepted are ASPLOS records. It is exciting for this paper, which engages the intersection of architecture and runtime/compiler techniques, to be counted among them.”

For ASPLOS, Li and Song will describe their novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-cooperative thread array (CTA) locality. In computing systems, CTAs execute the same programs on an input data set to deliver an output data set. They can do this concurrently or in parallel, and threads within the CTA can communicate with each other.

Their paper describes the concept, method, and design for a “CTA Clustering” framework that automatically exploits inter-CTA locality for general applications. Song and his colleagues designed the framework to be integrated as part of the compiler and immediately deployable on commodity GPUs. The paper also describes how they used NVIDIA GPU architectures (Fermi, Kepler, Maxwell, and Pascal) to validate their method and showed significant performance speedup garnered by improving L1 hit rates and reducing L2 transactions.

“CENATE [PNNL’s Center for Advanced Technology Evaluation] gave me a lot of flexibility to pursue this line of important research,” Song added. “I look forward to showcasing it at ASPLOS.”


  • Li A and SL Song. 2017. “Locality-Aware CTA Clustering For Modern GPUs.” To be presented at: 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2017). April 08-12, 2017, Xi’an, China.
  • Xie C, SL Song, J Wang, W Zhang, and X Fu. 2017. “Processing-in-Memory Enabled Graphics Processors for 3D Rendering.” To be presented at: 23rd IEEE International Symposium on High-Performance Computer Architecture (HPCA-23). February 04-08, 2017, Austin, Texas.

Page 56 of 278

Science at PNNL

Core Research Areas

User Facilities

Centers & Institutes

Additional Information

Research Highlights Home


Print this page (?)

YouTube Facebook Flickr TwitThis LinkedIn