GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism

August 13, 2025

Conference Paper

GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism

Abstract

With the increasing popularity of the robotics in the industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attentions from various domains. However, DRL computation are facing tremendous challenges due to its heterogeneous workloads and interleaved execution pattern, making it hard to exploit the full potential of uniform high-performance GPUs. To this end, we propose GMI-DRL, a systematic design to accelerate DRL on GPU platforms with GPU spatial multiplexing. We introduce a novel design of resource-efficient GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and DRL training throughput, and a highly efficient inter-GMI communication support to meet the DRL communication demands. We incorporate a process-based GMI programming interface to ease the access for GMI key functionalities. Comprehensive experiments reveals that GMIDRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.07×) and Horovod (up to 2.02×) in training throughput on DGX-A100 platform. Our work provides the first comprehensive solution for GPU spatial multiplexing and will be beneficial to the future GPU cloud resource provisioning and task scheduling across diverse application settings. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication.

Published: August 13, 2025

Citation

Wang Y., B. Feng, Z. Wang, G. Huang, T. Geng, A. Li, and Y. Ding. 2025. GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism. In Proceedings of the USENIX Annual Technical Conference (ATC 2025), July 7-9, 2025, Boston, MA, 89 - 103. Berkeley, California:USENIX Association. PNNL-SA-168900.

Research topics

Software Engineering

High-Performance Computing

Graph and Data Analytics

PNNL

GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism

Abstract

Citation

Research topics

Evaluating post-fire watershed response to varying burn severity and precipitation regimes using fully-distributed and integrated hydrologic models

Tethys: A Spatiotemporal Downscaling Model for Global Water Demand

Refining PeakDecoder Version 2