Scale Up Thrust
The Scale Up Thrust focuses on the development of innovative compiler, runtime, and simulation technologies tailored for Compute Express Link (CXL)-attached memories and computing devices, as well as near-memory computing. The primary goal of this thrust is to provide domain scientists with a seamless programming and execution environment, enabling them to efficiently develop and execute scientific, data analytics, and artificial intelligence (AI) workloads.
Runtime
The on-node runtime system is responsible for orchestrating data movement and managing computational tasks across various computing devices and memory systems. This includes not only traditional devices like CPUs and GPUs but also near-memory devices, which are equipped with computing capabilities. The runtime scheduler must make informed decisions about task placement across devices, optimizing for metrics such as performance or energy efficiency. This involves considering multiple devices within a single application while accounting for the specific capabilities of each device, device utilization, and the location of input data. The approach facilitates the seamless integration of emerging computing and memory technologies, from early-stage analytical models or simulators/emulators to prototypes and commercial products.
Compiler
The compiler infrastructure is designed to lower high-level programs, such as those written in PyTorch or JAX, into low-level machine code optimized for the target architecture. It focuses on both performance and energy efficiency. The compiler must not only generate efficient code for specific computing devices but also manage data placement across memory devices, taking into account the properties of the data and the most suitable memory location. For instance, it considers the “distance” of memory from the computing element and the “speed” at which data can be transferred between device memories. Built on the MLIR and LLVM frameworks, the compiler offers composability with a wide range of tools and optimization passes from the community, enabling progressive code lowering and the application of specific optimizations at various stages of the process.
Simulation/Emulation
In the early phases of developing new devices — whether computing or memory — the actual hardware may not be available, and its design is often not finalized. Nonetheless, it is crucial to understand the behavior of the hardware and ensure applications and software are prepared for the hardware's eventual availability. Simulation and emulation techniques address these challenges by allowing for "what-if" scenarios without the high costs of physical hardware development. Our methodologies encompass a range of techniques, from QEmu-based approaches to modeling and architectural simulators and FPGA prototypes, with the aim of characterizing, understanding, and influencing the hardware development cycle.