AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

August 1, 2024

Conference Paper

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

Abstract

Tensor algebra operations represent an important class of algorithms used across many applications, including machine learning, scientific computing, and data analytics. As a result, the efficient generation of custom accelerators for tensor operations has received increased attention. Previous efforts have produced automated tools enabling users to prototype and explore optimized accelerators. However, little effort has been focused on the host-accelerator interaction in these tools. Efficient use of hardware accelerators requires knowledge about the accelerator's capabilities (operations, data formats, and opcode support), the host CPU microarchitecture (e.g., memory hierarchy), the host-accelerator interface, and the application's features (which code regions should be mapped onto an accelerator). Manually rewriting the original applications to facilitate improved custom accelerator mapping is an error-prone and time-consuming endeavor. To cope with this, we propose AXI4MLIR, a new framework to automatically generate and optimize the communication between the host CPU and arbitrary accelerators that implement linear algebra algorithms. AXI4MLIR extends the MLIR compiler framework to automatically generate efficient host-accelerator driver code for accelerators with AXI-based interfaces. Our compiler extensions enable automatic driver code generation while carefully considering the host's memory hierarchy and target accelerator features. To demonstrate the flexibility and utility of AXI4MLIR, we test it with diverse use cases that include different types of accelerators, tiling scenarios, and dataflow schemes. We compare our experimental results to manual implementations of host-accelerator driver code and find that our approach can reduce CPU cache references by 56% and deliver up to a 1.65x speedup.

Published: August 1, 2024

Citation

Bohm Agostini N., P. Gibson, J. Haris, M. Jayaweera, N. Rubin, A. Tumeo, and J.L. Abellán, et al. 2024. AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2024), March 2-6, 2024, Edinburgh, UK, 143-157. Piscataway, New Jersey:IEEE. PNNL-SA-184683. doi:10.1109/CGO57630.2024.10444801

Research topics

High-Performance Computing

Artificial Intelligence

PNNL

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

Abstract

Citation

Research topics

PNNL Chief Commercialization Officer Leads a National Academies Panel

ELM2.1-XGBfire1.0: Improving wildfire prediction by integrating a machine-learning fire model in a land surface model

Pre-Scheduling of Affine Loops for HLS Pipelining