Scaling Laws for the Workload Throughput of Emerging Heterogeneous Clusters

August 14, 2025

Conference Paper

Scaling Laws for the Workload Throughput of Emerging Heterogeneous Clusters

Abstract

Next-generation HPC clusters are evolving into highly heterogeneous systems that integrate traditional computing resources with emerging accelerator technologies such as quantum processors, neuromorphic units, dataflow architectures, and specialized AI accelerators within a unified infrastructure. These advanced systems enable workloads to dynamically utilize different accelerators during various computation phases, creating complex execution patterns. Performance of the workloads thus may be impacted by many factors, including how the accelerators are shared, their utilization, and the placement within the system. Moreover, effects like the system and network state due to the overall system load can significantly impact the jobs completion rate. Understanding, identifying, and quantifying the impact of the most critical factors (e.g., the number of allocated accelerators) will help decide the investment decisions for accelerator acquisition and deployment that can improve the overall system throughput. This paper extensively studies these complex interactions among advanced accelerators within a HPC cluster and various workloads. We introduce a novel analytical model which predicts the speedup of a workload given an accelerator/system configuration. This model can be used to quantify the effect of augmenting additional accelerators on job performance running on a HPC cluster. We validate the model using both simulated and real environments.

Published: August 14, 2025

Citation

Alasandagutti A., J.D. Suetterlein, J.S. Firoz, S.J. Young, J.B. Manzano Franco, J.R. Stewart, and P. Bridges, et al. 2025. Scaling Laws for the Workload Throughput of Emerging Heterogeneous Clusters. In IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2025), May 19-22, 2025, Tromso, Norway, 73-82. Piscataway, New Jersey:IEEE. PNNL-SA-208649. doi:10.1109/CCGRID64434.2025.00025

Research topics

High-Performance Computing

PNNL

Scaling Laws for the Workload Throughput of Emerging Heterogeneous Clusters

Abstract

Citation

Research topics

DS-TIDE: Harnessing Dynamical Systems for Efficient Time-Independent Differential Equation Solving

A Visual Comparison of Silent Error Propagation

An Early Investigation of the HHL Quantum Linear Solver for Scientific Applications