January 8, 2025
Conference Paper

Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign

Abstract

Deep neural networks (DNNs) have achieved tremendous success in the past few years. However, their training and inference demand exceptional computational and memory resources. Quantization has been shown as an effective approach to mitigate the cost, with the mainstream data types reduced from FP32 to FP16/BF16 and recently FP8 in the latest NVIDIA H100 GPUs. With increasingly aggressive quantization, however, the conventional floating-point formats suffer from limited precision in representing numbers around zero. Recently, NVIDIA demonstrated the potential of using a Logarithmic Number System (LNS) for the next generation of tensor cores. While LNS mitigates the hurdles in representing small numbers, in this work we observed a mismatch between LNS and the emerging Large Language Models (LLM), where LLM exhibits significant outliers when directly adopting the LNS format. In this paper, we present a data-format/architecture codesign to bright this gap. On the format side, we propose a dynamic LNS format to flexibly represent outliers at a higher precision, by exploiting asymmetry in the LNS representation and identifying outliers through a per-vector basis. On the architecture side, for demonstration, we realize the dynamic LNS format in a systolic array, which can handle the irregularity of the outliers at runtime. We implement our approach on an Alveo U280 FPGA as a prototype. Experimental results show that our design can effectively handle the outliers and resolve the mismatch between LNS and LLM, contributing to an accuracy improvement of 15.4% and 16% over the floating-point and the original LNS baselines, using four state-of-the-art LLM models. Our observation and design lay a solid foundation for the large-scale adoption of the LNS format in the next-generation deep learning hardware.

Published: January 8, 2025

Citation

Haghi P., C. Wu, Z. Azad, Y. Li, A. Gui, Y. Hao, and A. Li, et al. 2024. Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign. In Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture (MICRO 2024), November 2-6, 2024, Austin, TX, 1617-1631. Los Alamitos, California:IEEE Computer Society. PNNL-SA-192868. doi:10.1109/MICRO61859.2024.00118