RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing

September 20, 2024

Conference Paper

RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing

Abstract

Ensuring high-quality recommendations for newly onboarded users requires the continuous retraining of Deep Learning Recommendation Models (DLRMs) with freshly generated data. To serve the online DLRM retraining, existing solutions use hundreds of CPU computing nodes designated for input preprocessing, causing significant power consumption that surpasses even the power usage of GPU trainers. To this end, we propose RAP, an end-to-end DLRM training framework that supports Resource-aware Automated GPU sharing for DLRM input Preprocessing and Training. The core idea of RAP is to accurately capture the remaining GPU computing resources during DLRM training for input preprocessing, achieving superior training efficiency without requiring additional resources. Specifically, RAP utilizes a co-running cost model to efficiently assess the costs of various input preprocessing operations, and it implements a resource-aware horizontal fusion technique that adaptively merges smaller kernels according to GPU availability, circumventing any interference with DLRM training. In addition, RAP leverages a heuristic searching algorithm that jointly optimizes both the input preprocessing graph mapping and the co-running schedule to maximize the end-to-end DLRM training throughput. The comprehensive evaluation shows that RAP achieves 78.3× speedup on average over CPU-based DLRM input preprocessing frameworks. In addition, the end-to-end training throughput of RAP is only 2.04% lower than the ideal case, which has no input preprocessing overhead.

Published: September 20, 2024

Citation

Wang Z., Y. Wang, J. Deng, D. Zheng, A. Li, and Y. Ding. 2024. RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2024), April 27- May 1, 2024, San Diego, CA, 2, 964-979. New York, New York:Association for Computing Machinery. PNNL-SA-189479. doi:10.1145/3620665.3640406

Research topics

High-Performance Computing

Artificial Intelligence

PNNL

RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing

Abstract

Citation

Research topics

ELM2.1-XGBfire1.0: Improving wildfire prediction by integrating a machine-learning fire model in a land surface model

Pre-Scheduling of Affine Loops for HLS Pipelining

PNNL Chief Commercialization Officer Leads a National Academies Panel