July 26, 2024
Conference Paper
Understanding the Inner-Workings of Language Models Through Representation Dissimilarity
Abstract
We use model stitching to understand the internal representations of language models. Similar to vision models, we find that "more is better," and representations learned with more data and larger width can improve the performance of weaker models via stitching. We likewise find that certain architecture choices, using GeLU vs SoLU activation functions, influence the quality of learned representations. Finally, model stitching (as opposed to other model diagnostic methods, like mode connectivity) can localize the different generalization strategies of text classifiers under domain shift to certain hidden layers.Published: July 26, 2024