The damage cyber criminals create can have major real-world consequences. In May 2021, a ransomware cyber-attack on the Colonial Pipeline caused supply chain effects resulting in gas shortages across the southeast United States.
How can we prevent such events from happening again? A new framework, V2W-BERT, introduces a novel way to classify software vulnerabilities to thwart cybercriminals before they can start. It was developed by Siddhartha Das—a graduate student at Purdue and research intern at PNNL—along with Alex Pothen from Purdue University, Mahantesh Halappanavar from Pacific Northwest National Laboratory (PNNL), Edoardo Serra from Boise State University and joint appointee at PNNL, and Ehab Al-Shaer from Carnegie Mellon University. This research was recognized with the “Best Application Paper Award” by the prestigious Institute of Electrical and Electronics Engineers International Conference on Data Science and Advanced Analytics (DSAA) 2021.
The paper, available on arXiv, details the development of V2W-BERT. This framework allows researchers to computationally map common software vulnerabilities with weakness enumerations. Software vulnerabilities are specific errors or faults in cyber product architecture, design, or implementation that can be exploited for unintended purposes by cybercriminals. If the vulnerability is not fixed, or patched, hackers can use it to their advantage to cause significant damage. Weakness enumerations provide a blueprint for understanding software flaws and their effects. They do this using a hierarchically designed dictionary of software weaknesses.
Hackers are constantly finding new ways to exploit weaknesses. They are pervasive in cyber products. A list of them, called the Common Weaknesses Enumeration (CWE), is maintained by the not-for-profit company MITRE. Likewise, a list of Common Vulnerabilities and Exposures (CVE) is maintained in the National Vulnerability Database by the National Information Technology Laboratory. By mapping CVEs to corresponding CWEs, researchers can understand and predict how a weakness generated by a vulnerability may be used maliciously.
“CVE to CWE mapping is primarily a manual process. This requires human expertise, is error prone, and does not scale. Any tool to automate this process will significantly impact the speed and accuracy with which cyber-defenders can address newly discovered vulnerabilities” said Mahantesh Halappanavar. At PNNL, Halappanavar is the Advanced Computing, Mathematics and Data (ACMD) Division Chief Scientist. He is also the Group Leader of the Data Sciences and Machine Intelligence group.
The V2W-BERT framework uses the latest advances in artificial intelligence to understand cybersecurity knowledge in the form of textual documents. Then, it uses this knowledge to establish links between the descriptions of different CVEs and CWEs. Specifically, it looks at a CVE-CWE pair and predicts the confidence with which a given CVE belongs to a given CWE class. Though previous attempts to map CVEs to CWEs have been made, V2W-BERT significantly outperforms these. This is particularly prominent in the case of rare CWEs where little or no training information exists.
“V2W-BERT has an advanced deep understanding of cybersecurity language. It is even capable of processing vulnerability and weakness descriptions never seen before. Along with its hierarchical classification capability, this gives V2W-BERT the ability to automatically maintain a cyber threat knowledge base. This capability is essential to protecting our nation,” said Edoardo Serra, Associate Professor at Boise State University.
The modular framework of V2W-BERT can be easily expanded to include the latest language models. It can also be adapted to other cybersecurity problem contexts. “We are keen to see our framework put into practice,” said Halappanavar. “Our future work will focus on two main areas. One is scaling larger pre-trained BERT models with high-performance computing platforms to further enhance the classification performance. The other is to automate suggestions for defining new weaknesses to match novel vulnerabilities.”
The development of V2W-BERT is the result of a collaboration developed through PNNL. It brings together researchers in the fields of Data Science, Artificial Intelligence, Cybersecurity, and High-Performance Computing. “Thanks to the PNNL high-performance computing infrastructure, we can test V2W-BERT and its evolutions on larger and even more complex data. This will allow us to further explore the boundaries of our technology,” said Serra.
This work was supported by the Department of Energy through the Center for Artificial Intelligence-focused Architectures and Algorithms and the High Performance Data Analytics Program at PNNL, and the Department of Defense. Halappanavar also holds a joint appointment at the Washington State University School of Electrical Engineering and Computer Science.
Reference: S. S. Das, E. Serra, M. Halappanavar, A. Pothen, and E. Al-Shaer, “V2W-BERT: A framework for effective hierarchical multiclass classification of software vulnerabilities,” in the 8th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2021, Porto, Portugal, October 6-9, 2021.