June 1, 2018
Conference Paper

Predicting Foreign Language Usage from English-Only Social Media Posts

Abstract

Social media is known for its multicultural and multilingual interactions, a natural product of which is code-mixing. Multilingual speakers mix languages they tweet to address a different audience, express certain feelings, or attract attention. This paper presents a large-scale analysis of 6 million tweets produced by 23 thousand multilingual users speaking 11 other languages besides English. We rely on this multilingual corpus to build predictive models for a novel task – inferring non- English languages that users speak exclusively from their English tweets. We contrast the predictive power of different linguistic signals and report that lexical content and syntactic structure of English tweets are the most predictive of non-English languages that users speak on Twitter. By analyzing cross-lingual transfer – the influence of non-English languages on various levels of linguistic performance in English, we present novel findings on stylistic and syntactic variations across speakers of 11 languages.

Revised: May 31, 2018 | Published: June 1, 2018

Citation

Volkova S., S.M. Ranshous, and L.A. Phillips. 2018. "Predicting Foreign Language Usage from English-Only Social Media Posts." In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2018), June 1-6, 2018, New Orleans, Louisiana, 608-614. Stroudsburg, Pennsylvania:Association for Computational Linguistics. PNNL-SA-124151.

Research topics