December 1, 2016
Feature

Changing the Game

Song's work makes the cut in two major high-performance computing conferences

Song_Leon_small
Shuaiwen Leon Song

Congratulations to Shuaiwen Leon Song, a research scientist with PNNL’s High Performance Computing group, who recently had two papers accepted at high-level conferences that focus on diverse and leading-edge research related to high-performance computer architectures.

Initially, the paper, “Processing-in-Memory Enabled Graphics Processors for 3D Rendering,” was accepted for presentation at the 23rd IEEE International Symposium on High-Performance Computer Architecture, or HPCA-23, to be held next February in Austin. HPCA-23 is a leading forum for scientists and engineers to present their diverse work involving computer architectures. Only 50 papers were selected for this year’s main session presentations.

Doom3box

The work, coauthored with scientists from the University of Houston and Beijing Advanced Innovation Center for Imaging Technology, examines the three-dimensional (3D) gaming experience on modern computer systems that provides today’s gamers with immersive graphics and intense imaging at the cost of memory bandwidth. Song and his coauthors employed two architectural designs to enable processing-in-memory-based graphics processing units (GPU) for efficient 3D rendering. The team used well-known games, including Doom 3 and Half-Life 2, to demonstrate their designs could improve texture filtering performance and 3D rendering up to 6.4 times and 65 percent over baseline GPUs. Their design also was shown to provide significant memory traffic and energy reduction without sacrificing rendering quality.

“As part of this work, we explored the Hybrid Memory Cube, or HMC, a type of stacked memory in the GPU, to efficiently process high-performance graphics applications and alter their pipeline to significantly reduce memory access,” Song explained. “We also enabled an approximate computing strategy with the new design.”

In April 2017, Song will present his work, “Locality-Aware CTA Clustering For Modern GPUs,” coauthored with Ang Li, a Ph.D. intern also with PNNL’s High Performance Computing group, at the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, known as ASPLOS 2017, being held in China. ASPLOS showcases “groundbreaking research at the intersection of at least two disciplines: architecture, programming languages, operating systems, and related areas,” making this acceptance especially noteworthy for Song.

AngLi_small
Ang Li, a Ph.D. intern at PNNL, co-authored the paper accepted by ASPLOS.

“This year, the conference only accepted 56 out of 321 papers submitted, which is a 17.4 percent acceptance rate,” he noted. “Both the number of papers submitted and accepted are ASPLOS records. It is exciting for this paper, which engages the intersection of architecture and runtime/compiler techniques, to be counted among them.”

For ASPLOS, Li and Song will describe their novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-cooperative thread array (CTA) locality. In computing systems, CTAs execute the same programs on an input data set to deliver an output data set. They can do this concurrently or in parallel, and threads within the CTA can communicate with each other.

Their paper describes the concept, method, and design for a “CTA Clustering” framework that automatically exploits inter-CTA locality for general applications. Song and his colleagues designed the framework to be integrated as part of the compiler and immediately deployable on commodity GPUs. The paper also describes how they used NVIDIA GPU architectures (Fermi, Kepler, Maxwell, and Pascal) to validate their method and showed significant performance speedup garnered by improving L1 hit rates and reducing L2 transactions.

“CENATE [PNNL’s Center for Advanced Technology Evaluation] gave me a lot of flexibility to pursue this line of important research,” Song added. “I look forward to showcasing it at ASPLOS.”

References:

  • Li A and SL Song. 2017. “Locality-Aware CTA Clustering For Modern GPUs.” To be presented at: 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2017). April 08-12, 2017, Xi’an, China.
  • Xie C, SL Song, J Wang, W Zhang, and X Fu. 2017. “Processing-in-Memory Enabled Graphics Processors for 3D Rendering.” To be presented at: 23rd IEEE International Symposium on High-Performance Computer Architecture (HPCA-23). February 04-08, 2017, Austin, Texas.

Download Publications

Key Capabilities

###

About PNNL

Pacific Northwest National Laboratory draws on its distinguishing strengths in chemistry, Earth sciences, biology and data science to advance scientific knowledge and address challenges in sustainable energy and national security. Founded in 1965, PNNL is operated by Battelle for the Department of Energy’s Office of Science, which is the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://www.energy.gov/science/. For more information on PNNL, visit PNNL's News Center. Follow us on Twitter, Facebook, LinkedIn and Instagram.

Published: December 1, 2016