October 2, 2021
Conference Paper

High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers

Abstract

The increased need for efficient ways to implement domain-specific accelerators is driving design methodologies towards the use of abstractions higher than the Register Transfer Level (RTL). In this scenario, High Level Synthesis (HLS) plays a significant role by enabling the automatic generation of custom hardware accelerators starting from high level descriptions (e.g., C code). Conventional HLS tools exploit parallelism mostly at the Instruction Level (ILP). They statically schedule the input specifications, and build centralized Finite State Machine (FSM) controllers. However, aggressive exploitation of ILP in many applications has diminishing returns and, usually, centralized approaches do not efficiently exploit coarser parallelism because FSMs are inherently serial. In this paper we present a HLS framework able to synthesize applications that, beside ILP, also expose Task Level Parallelism (TLP). An application can expose TLP through annotations that identify the parallel functions (i.e., tasks). To generate accelerators that efficiently execute concur- rent tasks, we need to solve several issues: devise a mechanism to support concurrent execution flows, exploit memory parallelism, and manage synchronization. To support concurrent execution flows, we introduce a novel adaptive controller. The adaptive controller is composed of a set of interacting control elements that independently manage the execution of a single operation or function call. These control elements check dependencies and resource constraints at runtime, enabling as soon as possible execution. To support parallel access to shared memories and synchronization, we introduce a novel Hierarchical Memory Interface (HMI). With respect to previous solutions, the proposed interface supports multi-ported memories and atomic memory operations, which commonly occur in parallel programming. Our framework can generate the hardware implementation of C functions by employing two different approaches, depending on its characteristics. If a function exposes TLP, then the framework generates hardware implementations based on the adaptive controller. Otherwise, the framework implements the function by exploiting a more conventional FSM approach, which is optimized for ILP exploitation. We evaluate our framework on a set of parallel applications, and show substantial performance improvements (average speedup of 4.7) with limited area over- heads (average area increase of 5.48 times).

Published: October 2, 2021

Citation

Castellana V.G., A. Tumeo, and F. Ferrandi. 2021. High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers. In IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), May 17-21, 2021, Virtual, Online, Paper No. 9460500. Piscataway, New Jersey:IEEE. PNNL-SA-157406. doi:10.1109/IPDPS49936.2021.00028