Understanding the Inner-Workings of Language Models Through Representation Dissimilarity

July 26, 2024

Conference Paper

Understanding the Inner-Workings of Language Models Through Representation Dissimilarity

Abstract

We use model stitching to understand the internal representations of language models. Similar to vision models, we find that "more is better," and representations learned with more data and larger width can improve the performance of weaker models via stitching. We likewise find that certain architecture choices, using GeLU vs SoLU activation functions, influence the quality of learned representations. Finally, model stitching (as opposed to other model diagnostic methods, like mode connectivity) can localize the different generalization strategies of text classifiers under domain shift to certain hidden layers.

Published: July 26, 2024

Citation

Brown D.R., C.W. Godfrey, N.C. Konz, J.H. Tu, and H.J. Kvinge. 2023. Understanding the Inner-Workings of Language Models Through Representation Dissimilarity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore, edited by H. Bouamor, J. Pino, and K. Bali, 6543–6558. Kerrville, Texas:Association for Computational Linguistics. PNNL-SA-186399. doi:10.18653/v1/2023.emnlp-main.403

Research topics

Artificial Intelligence

PNNL

Understanding the Inner-Workings of Language Models Through Representation Dissimilarity

Abstract

Citation

Research topics

PNNL Chief Commercialization Officer Leads a National Academies Panel

ELM2.1-XGBfire1.0: Improving wildfire prediction by integrating a machine-learning fire model in a land surface model

AI Improves the Accuracy, Reliability, and Economic Value of Continental-Scale Flood Predictions