August 13, 2025
Conference Paper
GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism
Abstract
With the increasing popularity of the robotics in the industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attentions from various domains. However, DRL computation are facing tremendous challenges due to its heterogeneous workloads and interleaved execution pattern, making it hard to exploit the full potential of uniform high-performance GPUs. To this end, we propose GMI-DRL, a systematic design to accelerate DRL on GPU platforms with GPU spatial multiplexing. We introduce a novel design of resource-efficient GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and DRL training throughput, and a highly efficient inter-GMI communication support to meet the DRL communication demands. We incorporate a process-based GMI programming interface to ease the access for GMI key functionalities. Comprehensive experiments reveals that GMIDRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.07×) and Horovod (up to 2.02×) in training throughput on DGX-A100 platform. Our work provides the first comprehensive solution for GPU spatial multiplexing and will be beneficial to the future GPU cloud resource provisioning and task scheduling across diverse application settings. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication.Published: August 13, 2025