PNNL @ SC21
PNNL computer scientists will be presenting the latest research in high-performance computing (HPC) at SC21, the International Conference for High Performance Computing, Networking, Storage, and Analysis.
The SC21 conference will feature tutorials, workshops, and presentations on HPC, machine learning, artificial intelligence, modeling and simulation, and the application of these capabilities to accelerate scientific discovery. The full program of presentations is planned for November 14–19. Learn more about Pacific Northwest National Laboratory's (PNNL's) presence at SC21 below.
DOE @ SC21
If you're attending SC21 in person, be sure to visit the Department of Energy (DOE) booth. You can learn more about DOE's presence at SC21 at scdoe.info.
Meet a Recruiter
Curious about HPC or computational sciences open positions at PNNL? Bring your questions to our recruiting team. They're holding a virtual Q&A on Thursday, November 18 from 9:00 to 11:00 a.m. Pacific Time/11:00 a.m. to 1:00 p.m. Central Time on Zoom. Come learn about more open positions and what it's like to work at PNNL. We can’t wait to meet you!
PNNL Speakers, Presentations, and Posters
November 14, 2021
Toward Modern C++ Language Support for MPI
Sayan Ghosh, Andrew Lumsdaine (University of Washington Joint Appointee)
Presentation | 12:00 p.m. – 12:30 p.m. PST/2:00 p.m. – 2:30 p.m. CST
The C++ programming language has made significant inroads in improving performance and productivity across a broad spectrum of applications and hardware. The C++ language bindings to Message Passing Interface (MPI) had been deleted since MPI 3.0 based on the rationale that it added minimal functionality over the existing C bindings, relative to modern C++ practice, while incurring significant amount of maintenance to the MPI standard specification. READ MORE
A High-Performance Sparse Tensor Algebra Compiler in MLIR
Ruiqin Tian, Luanzheng Guo, Gokcen Kestor
Presentation | 1:30 p.m. – 2:10 p.m. PST/3:30 p.m. – 4:10 p.m. CST
Sparse tensor algebra is widely used in many applications. The performance of sparse tensor algebra kernels strongly depends on the characteristics of the input tensors. Therefore, many storage formats are designed for tensors to achieve optimal performance for particular applications and architectures, which makes it challenging to implement and optimize every tensor operation of interest on a given architecture. READ MORE
November 15, 2021
IA3 2021: 11th Workshops on Irregular Applications: Architectures and Algorithms
Antonino Tumeo, Marco Minutoli, Vito Giovanni Castellana, John Feo
Workshop | 7:00 a.m. – 3:30 p.m. PST/9:00 a.m. – 5:30 p.m. CST
Due to the heterogeneous datasets they process, data-intensive applications employ a diverse set of methods and data structures, exhibiting irregular memory accesses, control flows, and communication patterns. Current supercomputing systems are organized around components optimized for data locality and bulk synchronous computations. READ MORE
HPC Graph Toolkits and the GraphBLAS Forum
Antonino Tumeo, John Feo, Mahantesh Halappanavar
Presentation | 3:15 p.m. – 4:45 p.m. PST/5:15 p.m. – 6:45 p.m. CST
HPC systems are diverse. Programmers can’t afford to customize software from scratch for each case. We need frameworks that hide hardware behind high-level abstractions. Workflows are complex with graphs, databases, simulations, machine learning, and more. READ MORE
November 16, 2021
Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores
Presentation: 1:30 p.m. – 2:00 p.m. PST/3:30 p.m. – 4:00 p.m. CST
Accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions are usually restricted by limited precision support on GPUs. To break such restrictions, we introduce the first arbitrary precision neural network framework to fully exploit quantization benefits on Ampere GPU tensor cores. READ MORE
SODA-OPT: System-Level Design in MLIR for HLS
Nicolas Bohm Agostini, Antonino Tumeo
Poster | 6:30 a.m. – 3:00 p.m. PST/8:30 a.m. – 5:00 p.m. CST
High-level-synthesis enables the generation of hardware descriptions from applications implemented with high-level languages. State-of-the-art tools, however, typically require the application to be manually translated to C/C++ and carefully annotated to improve final design performance. READ MORE
Towards a Scalable and Distributed High-Performance SHAD C++ Library
Poster | 6:30 a.m. – 3:00 p.m. PST/8:30 a.m. – 5:00 p.m. CST
SHAD is the Scalable High-performance Algorithms and Data-structures C++ library, providing general purpose building blocks and supporting high-level custom utilities. SHAD is designed with scalability, flexibility, productivity, and portability in mind, and serves as a playground for research in parallel programming models, runtime systems, and their applications. READ MORE
FPGA-Accelerated Ripples
Reece Neff, Marco Minutoli, Antonino Tumeo
Poster* | 6:30 a.m. – 3:00 p.m. PST/8:30 a.m. – 5:00 p.m. CST
Influence Maximization is an important graph algorithm that is gaining traction in areas where social networks and other related graphs are processed and analyzed. The long run time of the algorithm opened the door for optimizations but is challenging to parallelize and port onto novel architecture due to its irregular and memory-hungry behavior. READ MORE
*This poster is one of four finalists for the Best Research Poster award. There will be a 12-minute presentation on November 17 in the Best Research Poster presentation session from 8:30 a.m. – 8:50 a.m. PST/10:30 a.m. – 10:50 a.m. CST.
Breadth-First Search on Xilinx Versal
Guilherme Prado Alves, Marco Minutoli, Antonino Tumeo
Poster | 6:30 a.m. – 3:00 p.m. PST/8:30 a.m. – 5:00 p.m. CST
The new Xilinx Versal Platform provides a highly heterogeneous system to programmers. How these diverse resources can be utilized effectively is an open question. This project implements breadth-first search on this platform, utilizing all available regions to accelerate this workload. READ MORE
Hardware Acceleration of Complex Machine Learning Models through Modern High-Level Synthesis
Poster | 6:30 a.m. – 3:00 p.m. PST/8:30 a.m. – 5:00 p.m. CST
Machine-learning algorithms continue to receive significant attention from industry and research. As the models increase in complexity and accuracy, their computational and memory demands also grow, pushing for more powerful, heterogeneous architectures. READ MORE
November 17, 2021
Single-Node Partitioned-Memory for Huge Graph Analytics: Cost and Performance Trade-Offs
Sayan Ghosh, Nathan Tallent, Marco Minutoli, Mahantesh Halappanavar, Ananth Kalyanaraman (WSU Joint Appointee)
Presentation | 9:30 a.m. – 12:00 p.m. PST/11:30 a.m. – 2:00 p.m. CST
Because of cost, nonvolatile memory NVDIMMs such as Intel Optane are attractive in single-node big-memory systems. We evaluate performance and cost trade-offs when using Optane as volatile memory for huge-graph analytics. READ MORE
Advanced Architecture Testbeds: Community Resources for Enhanced HPC Research
Presentation | 10:15 a.m. – 11:15 a.m. PST/12:15 p.m. – 1:15 p.m. CST
This presentation brings together panelists from advanced architecture testbed efforts including Swiss National Supercomputing Centre’s User lab, PNNL’s Center for Advanced Technology Evaluation (CENATE) testbed, Heterogeneous Advanced Architecture Platforms at Sandia National Laboratories, the Rogues Gallery at Georgia Tech, Experimental Computing Lab at Oak Ridge National Laboratory, and the Maui HPC Center to discuss next-generation architectures and challenges. READ MORE
Scaling Subgraph Isomorphism on Distributed Multi-GPU Systems Using Trie-Based Data Structure
Arif Khan, Mahantesh Halappanavar, Edoardo Serra (Boise State University Joint Appointee)
Presentation | 2:30 p.m. – 3:00 p.m. PST/4:30 p.m. – 5:00 p.m. CST
Subgraph isomorphism is a pattern-matching algorithm widely used in many domains such as chem-informatics, bioinformatics, databases, and social network analysis. It is computationally expensive and is a proven NP-hard problem. The massive parallelism in GPUs is well suited for solving subgraph isomorphism. However, current GPU implementations are far from the achievable performance. READ MORE
November 18, 2021
Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Serve
Joshua Suetterlein, Joseph Manzano
Presentation | 1:30 p.m. – 2:00 p.m. PST/3:30 p.m. – 4:00 p.m. CST
Multi-accelerator servers are increasingly being deployed in shared multi-tenant environments (such as in cloud data centers) to meet the demands of large-scale compute-intensive workloads. In addition, these accelerators are increasingly being interconnected in complex topologies and workloads are exhibiting a wider variety of inter-accelerator communication patterns. READ MORE
Scalable PGAS-Based State Vector Simulation of Quantum Circuit
Presentation: 2:00 p.m. – 2:30 p.m. PST/4:00 p.m. – 4:30 p.m. CST
High-performance quantum circuit simulation in a classic HPC is still necessary in the noisy intermediate-scale quantum era. Observing that the major obstacle of scalable state-vector quantum simulation arises from the massively fine-grained irregular data-exchange with remote nodes, in this paper we present state-vector quantum circuit simulation to apply the emerging partitioned global address space-based communication models for efficient large-scale quantum circuit simulation. READ MORE
November 19, 2021
HPC for Urgent Decision Making
Workshop | 6:30 a.m. – 10:00 a.m. PST/8:30 a.m. – 12:00 p.m. CST
Responding to natural disasters, pandemics, and time-sensitive societal issues, technological advances are creating exciting new opportunities with the potential to move HPC beyond traditional computational workloads. Combining high-velocity data and live analytics with HPC models can aid in responding to urgent real-world problems, ultimately saving lives and reducing economic loss. READ MORE
Guarding Numerics Amidst Rising Heterogeneity
Presentation | 7:00 a.m. – 7:20 a.m. PST/9:00 a.m. – 9:20 a.m. CST
New heterogeneous computing platforms—especially GPUs and other accelerators—are being introduced at a brisk pace, motivated by the goals of exploiting parallelism and reducing data movement. Unfortunately, their sheer variety and the optimization options supported by them have been observed to alter the computed numerical results to the extent that reproducible results are no longer possible to obtain without extra effort. READ MORE
RSDHA: Redefining Scalability for Diversely Heterogeneous Architectures
Panel Discussion | 7:10 a.m. – 8:10 a.m. PST/9:10 a.m. – 10:10 a.m. CST
The panel discussion at RSDHA will seek answers for two primary questions:
- How could the traditional HPC applications adopt the architectural, programming, and runtime approaches employed by the state-of-the-art diversely heterogeneous systems?
- How could the diversely heterogeneous architectures for mobile and autonomous systems take examples from traditional HPC to beat the multi-node scalability challenges as they become increasingly more connected? READ MORE