July 5, 2023
Report
MetaText: Compositional Generalization in Deep Language Models
Abstract
Compositional generalization, the ability to infer and reason using novel combinations of previously encountered entities and structures, is a trait of great utility across a variety of deep learning tasks; it implies a certain ability to reason using consistent and therefore interpretable rules, to understand the constituent parts of its input at multiple levels of granularity, to be robust to spurious differences between semantically equivalent inputs, and more. We seek to understand the degree to which existing natural language models achieve or fall short of compositional generalization across a variety of tasks, whether there are distinct types of compositional generalization in practice, and to identify avenues of intervention through which we can improve models’ compositional generalization ability. After exploring these questions using an array of models, tasks, and potential interventions we see that large, pretrained language models have encountered sufficient training data to account for a variety of fine-grained compositional behavior but still struggle to reason at the level of phrases or larger language structures. Non-intrusive, data-based interventions in the form of augmenting individual sequences with compressed versions of themselves or deriving new examples from induced grammars prove insufficient to encourage greater levels of compositional reasoning, indicating that future work might benefit most from focusing on changes to a model’s inductive bias at the architectural or loss level, or by integrating compositionality-boosting data interventions into the large-scale pretraining process itself.Published: July 5, 2023