March 24, 2020
Journal Article

Multiple social platforms reveal actionable signals for software vulnerability awareness: A study of GitHub, Twitter and Reddit

Abstract

Software vulnerabilities are flaws in computer systems that leave users open to attack. In many cases, these vulnerabilities go unnoticed and remain unresolved in codebases. Thus, awareness of software vulnerabilities among the public is crucial to ensure effective cybersecurity practices, the development of high quality software, and ultimately national security. This awareness can be better understood by studying the spread and evolution of software vulnerability discussions in online communities. This work is the first to evaluate and contrast how discussions about software vulnerabilities spread on three social platforms -- Twitter, GitHub, and Reddit. To lay the groundwork, we showcase a novel fundamental framework for measuring information spread that identifies the spread mechanisms and observables across platforms, the units of information, and the groups of measurements that can be applied to focus on a specific phenomena e.g., information cascades. We then analyze and contrast social network topologies for three example social networks and measure the scale and speed of the spread of discussion of specific vulnerabilities to understand how far and how widely they spread, how many users participate in discussions, and the duration of their spread. To demonstrate the awareness of more impactful software vulnerabilities, a subset of our analysis focuses on vulnerabilities targeted during recent major cyber attacks as well as vulnerabilities exploited by advanced persistent threat groups. We discover that usually, vulnerability discussions start on GitHub, before occurring on Twitter and Reddit. While studying how some user-level and content-level characteristics influence vulnerability spread, we observe that Twitter discussions started by users predicted to be humans have larger size, breadth, depth, adoption rate, lifetime, and structural virality compared to those started by users predicted to be bots. On Reddit, we contrast the differences in thread structure that originate from posts with positive, negative and neutral polarity. We find that posts that are positive have larger, deeper and wider discussions compared to negative and neutral posts. We anticipate the results of our analysis to not only increase the understanding of software vulnerability awareness but also inform models for simulating information spread across multiple social environments online.

Revised: April 8, 2020 | Published: March 24, 2020

Citation

Shrestha P., A. Visweswara Sathanur, S. Maharjan, E.G. Saldanha, D.L. Arendt, and S. Volkova. 2020. Multiple social platforms reveal actionable signals for software vulnerability awareness: A study of GitHub, Twitter and Reddit. PLoS One 15, no. 3:Article No. e0230250. PNNL-SA-143971. doi:10.1371/journal.pone.0230250