July 26, 2024
Conference Paper

Understanding the Inner-Workings of Language Models Through Representation Dissimilarity

Abstract

We use model stitching to understand the internal representations of language models. Similar to vision models, we find that "more is better," and representations learned with more data and larger width can improve the performance of weaker models via stitching. We likewise find that certain architecture choices, using GeLU vs SoLU activation functions, influence the quality of learned representations. Finally, model stitching (as opposed to other model diagnostic methods, like mode connectivity) can localize the different generalization strategies of text classifiers under domain shift to certain hidden layers.

Published: July 26, 2024

Citation

Brown D.R., C.W. Godfrey, N.C. Konz, J.H. Tu, and H.J. Kvinge. 2023. Understanding the Inner-Workings of Language Models Through Representation Dissimilarity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore, edited by H. Bouamor, J. Pino, and K. Bali, 6543–6558. Kerrville, Texas:Association for Computational Linguistics. PNNL-SA-186399. doi:10.18653/v1/2023.emnlp-main.403

Research topics