With the increasing use of automated, machine learning-driven tools and the downstream impact that algorithmic judgements can have, it is critical to develop models that are robust to evolving or manipulated inputs. Evaluating the reliability of multimodal models across linguistic variations to understand model susceptibility to intentional linguistic adversarial attacks as well as natural linguistic variations is essential in this pursuit. We present extensive analysis of model robustness and susceptibility to linguistic variations in the setting of deceptive news detection, a difficult classification task that is an increasingly important problem to solve with the impact of misinformation spread online. We evaluate the effectiveness of incorporating adversarial defense strategies and measure model susceptibility to state-of-the-art adversarial attacks using two types of linguistic attacks — character and word perturbations. We consider two multiclass prediction tasks — a 3-way classification of tweets as trustworthy, propaganda, or disinformation; and a 4-way classification as clickbait, hoax, satire, or conspiracy — and compare the performance of three embeddings that have been state-of-the-art
for several NLP tasks — GloVe, ELMo, and BERT — to highlight consistent trends in susceptibility, high confidence misclassifications, and high impact failures. We find that character or mixed ensemble models are the most effective defense mechanisms and that character perturbations are a more effective attack than word perturbations for deception classification.
Published: October 28, 2022
Citation
Glenski M.F., E.M. Ayton, R.J. Cosbey, D.L. Arendt, and S. Volkova. 2021.Evaluating Deception Detection Model Robustness To Linguistic Variation. In Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media (SocialNLP 2021), June 10, 2021, Online Workshop, 70-80. Stroudsburg, Pennsylvania:Association for Computational Linguistics.PNNL-SA-156922.doi:10.18653/v1/2021.socialnlp-1.6