May 16, 2025
Conference Paper

HPC Network Simulation Tuning via Automatic Extraction of Hardware Parameters

Abstract

Popular HPC network interconnection simulators such as SST/macro provide a variety of configurable parameters to explore the design space of hardware components such as network interface cards (NIC), switches, and links among them. While such knobs provide flexibility to explore design trade-offs for novel hardware, manually configuring simulations for matching configurations of the existing hardware to focus on topology exploration can be cumbersome and error-prone, leading to widely inaccurate simulations. This challenge is compounded when specifications of various (proprietary) technologies are not readily available or intentionally omitted. In this work, we propose a framework to autotune the multiple network models’ simulation configurations within SST/macro using Tree-structured Parzen Estimator-based Bayesian optimization to observe the effect on simulation accuracy across different message regimes. These regimes consist of small to large message sizes and latency to bandwidth-bound messages. We provide a detailed analysis of the simulation error for four representative HPC systems. Our Bayesian optimization based autotuning framework for network models achieves a maximum of 5x improvement in accuracy over best-effort manual configurations based on available hardware specifications.

Published: May 16, 2025

Citation

Suetterlein J.D., S.J. Young, J.S. Firoz, J.B. Manzano Franco, N.R. Tallent, R.D. Friese, and K.J. Barker, et al. 2024. HPC Network Simulation Tuning via Automatic Extraction of Hardware Parameters. In IEEE High Performance Extreme Computing Conference (HPEC 2024), September 23-27, 2024, Wakefield, MA, 1-10. Piscataway, New Jersey:IEEE. PNNL-SA-203658. doi:10.1109/HPEC62836.2024.10938439