August 5, 2021
Journal Article

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing

Abstract

In this work, we propose ARENA, a data-centric computation flow architecture (CFA) for future reconfigurable HPC clusters. ARENA links future reconfigurable nodes as a closed chain and brings computation to nodes where data resides. The computation logic of an application is partitioned as a task flow. Task tokens are asynchronously injected into the chain complying the dependency. A task token comprises the configuration “mould” that describes how the hardware substrate resources (e.g., LUT, CGRA, etc) should be configured for (best) executing this particular task. When circulating along the chain, each node routes the tokens to the next node while checking if this task applies to the data resides locally, and whether the local node currently has sufficient substrate area for concretizing the task configuration. If so, the task is fetched out and a certain area is configured as a special accelerator for the execution of the task. Otherwise, the token is conveyed to the next node. An extra table can be carried along with the token for indicating which nodes have to be revisited in the next circulation, which can reduce the task circulation overhead given a reconfigurable network.

Published: August 5, 2021

Citation

Tan C., C. Xie, T. Geng, A. Marquez, A. Tumeo, K.J. Barker, and A. Li. 2021. ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing. IEEE Transactions on Parallel and Distributed Systems 32, no. 12:2880-2892. PNNL-SA-152862. doi:10.1109/TPDS.2021.3081074