April 1, 2023
Conference Paper

Efficient Clustering of Software Vulnerabilities using Self Organizing Map (SOM)


The common vulnerabilities and exposures (CVE) database was created with a mission to ``identify, define, and catalog publicly disclosed cybersecurity vulnerabilities''. This rich body of information can be used to enable rapid and efficient response to secure and defend cyber operations and protect critical cyber infrastructure. The main goal of this paper is to develop a visual analytics tool to enable deep analysis of CVEs using unsupervised clustering techniques. We enhance our analysis by first mapping CVEs to hierarchical-classes in Common Weakness Enumeration (CWE) using information in the National Vulnerability Database (NVD). Both the mapping and the numerical representation of CVEs are enabled by V2W-BERT, which uses natural language processing of the extensive information in NVD to generate a large tabular database of 137,226 CVE entries from 1999 to 2020, where each CVE is represented by a vector of 768 numerical features. The vectorized data is processed by Self-Organizing Maps (SOM), which is an unsupervised machine learning technique for dimensionality reduction, visual representation and clustering. Using a Torus map of 6417 units, we achieve ~10-fold data compression of ~140k CVEs using SOM. The trained map is further clustered using standard K-means clustering into 138 clusters of CVEs. We conducted a brief investigation of the rich mapping of CVEs to best-matching-units to K-means clusters, as well as CVEs to CWEs. For example, this novel mapping provided insight into the role of CWE-59 and CWE-264 in several CVEs that is otherwise hard to explore in the original data. We conclude that our this novel approach will not only enable deep analysis of the complex relationships between CVEs and CWEs, but also a mechanism to quickly respond to and design mitigation actions for rapidly evolving vulnerabilities that have not been mapped to existing CWEs.

Published: April 1, 2023


Panchal K., S. Das, L.F. De La Torre Quintana, J.H. Miller, R.J. Rallo Moya, and M. Halappanavar. 2022. Efficient Clustering of Software Vulnerabilities using Self Organizing Map (SOM). In IEEE International Symposium on Technologies for Homeland Security (HST 2022), November 14-15, 2022, Virtual, Online, 1-7. Piscataway, New Jersey:IEEE. PNNL-SA-174297. doi:10.1109/HST56032.2022.10025443