March 28, 2025
Conference Paper

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

Abstract

While NVIDIA has been the dominant provider of GPUs for HPC and ML, now AMD has several offerings of GPUs. This encourages programmers to try out AMD GPUs for new codes and also port existing codes over. Unfortunately, without understanding the floating-point differences between these GPU types, software development or porting can introduce bugs—and currently such an understanding is lacking. The magnitude of this open question becomes clear if one imagines the the number of floating-point precision choices (FP16, FP32, etc.), floating-point formats (standard floats, brain-float, etc.), and execution units available (elementary units, matrix/tensor cores, etc.) Questions such as rounding modes and subnormal support are also important. Most of these answers are unknown today or are hard to access. We provide the first testing-guided approach that answers a significant number of these questions. We also devise tests to reveal internal information (e.g., extra bits kept) to make sure that our findings are reliable. Many of our tests employ systematically generated random-programs, others apply fast-math flags and some involve fused multiplyadd. Especially for tensor/matrix cores, the tests have nontrivial logic that we present Our testing approach is reusable for the plethora of GPUs yet to be introduced. Our findings include up to 7 ulps of difference between NVIDIA and AMD for sin and cos at FP32 precision and 3 ulp at FP64. In our study of matrix cores (NVIDIA) and tensor cores (AMD), we have extensively characterized rounding modes (truncation versus round-to-nearest), the number of extra internal bits kept (whether 3 bits are kept or not), subnormal support for inputs and outputs across four different floating-point formats and across NVIDIA A100 and AMD MI250X GPUs. We believe that this wealth of data becoming available for the first time may help avoid significant porting bugs when migrating code across these platforms.

Published: March 28, 2025

Citation

Li X., A. Li, B. Fang, K. Swirydowicz, I. Laguna, and G. Gopalakrishnan. 2024. FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators. In IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2024), May 6-9, 2024, Philadelphia, PA, 39-46. Piscataway, New Jersey:IEEE. PNNL-SA-190978. doi:10.1109/CCGrid59990.2024.00014

Research topics