Pew research polls report 62 percent of
U.S. adults get news on social media (Gottfried
and Shearer, 2016). In a December
poll, 64 percent of U.S. adults said
that “made-up news” has caused a “great
deal of confusion” about the facts of current
events (Barthel et al., 2016). Fabricated
stories spread in social media, ranging
from deliberate propaganda to hoaxes
and satire, contributes to this confusion in
addition to having serious effects on global
stability.
In this work we build predictive models to
classify 130 thousand news tweets as suspicious
or verified, and predict four subtypes
of suspicious news – satire, hoaxes,
clickbait and propaganda. We demonstrate
that neural network models trained
on tweet content and social network interactions
outperform lexical models. Unlike
previous work on deception detection,
we find that adding syntax and grammar
features to our models decreases performance.
Incorporating linguistic features,
including bias and subjectivity, improves
classification results, however social interaction
features are most informative for
finer-grained separation between our four
types of suspicious news posts.
Revised: July 31, 2017 |
Published: July 30, 2017
Citation
Volkova S., K.J. Shaffer, J. Jang, and N.O. Hodas. 2017.Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, Vancouver, BC, Canada, 2, 647-653; Paper No. 10.18653/v1/P17-2102.PNNL-SA-123856.doi:10.18653/v1/P17-2102