PERFORMANCE AND USABILITY ENHANCEMENTS FOR CONTINUOUS SUBGRAPH MATCHING QUERIES ON GRAPH-STRUCTURED DATA

Patent ID: 9303 | Patent Number 10,810,210 | Status: Granted

Abstract

StreamWorks is a network analysis framework that enables an analyst to monitor and analyze streaming computer network traffic data to identify emerging computer network intrusions and threats. Different types of graphical query patterns may be defined for specific types of cyberattacks including various network scans, reflector attacks, flood attack, viruses, worms, etc. StreamWorks will support subgraph matching on computer network attributes such as hostnames, IP addresses, protocols, ports, packet sizes, machine types, and message types. The speed of subgraph pattern matching will be accelerated by collecting and utilizing node and edge frequency information to optimize search paths through a massive data graph. Computer network intrusion analysis will involve live computer network data streamed in at high data rates and the analysis of data graphs consisting of millions to billions of edges. For known patterns, specific graphical query patterns are collected in a library and continuously and efficiently matched against the dynamic graph as it is updated. Each graph query is captured as a subgraph join tree which decomposes the query graph into smaller search subpatterns. These smaller subpatterns signify precursor events that emerge early before the full query pattern is complete. As precursor events are detected in data streams, they are matched to the nodes of different subgraph join trees. Matching that occurs higher in a join tree indicates a higher probability that a specific type of attack is occurring. A similarity or confidence score may be computed for the partial matching through training on collected computer network traffic data to measure the frequencies of occurrence of partial subpatterns as precursors to the full graph query pattern. For unknown patterns or zero-day exploits, the same analysis framework may be applied to track the emergence of small subpatterns as they appear in the data stream. The system may be seeded with hints to look for small graph patterns that involve rare events (based on collected statistics), events involving critical resources such as an authentication server, domain name server, database, etc., or particular host machines of specific suspicions or interests to analysts. When seeded subpatterns are found in the data stream, they are tracked and monitored within subgraph join trees. Here, subpatterns are joined based on specific criteria such as when the subpatterns grow beyond a certain threshold size, additional critical resources are introduced into a subpattern, or important types of interactions or communications are detected. Thus, full attack patterns may dynamically emerge from the small seeded patterns or hints. The initial seeded patterns may have confidence scores generated from collected statistics or assigned by analysts, which are then propagated up through the subgraph join tree. Additionally, StreamWorks will provide mechanisms for analysts to vet tracked subpatterns so as to improve analysis and performance by eliminating benign patterns from being monitored and assessed. The advanced dynamic graph algorithms have been packaged into a streaming network analysis framework known as StreamWorks. With StreamWorks, a scientist or analyst may detect and identify precursor events and patterns as they emerge in complex networks. This analysis framework is intended to be used in a dynamic environment where network data is streamed in and is appended to a large-scale dynamic graph. An interactive graph query construction tool has been developed that will allow an analyst to build a query graph. Various cyberattack templates have been developed for querying the dynamic graph, where an analyst may tailor the attributes of a cyberattack query by adjusting parameters of the cyberattack template. The dynamic results, which are the subpatterns of the template that are matched in the dynamic graph, are returned to the analyst in a visualization showing the emerging and evolving patterns along with a visualization of the subgraph join tree containing statistics on the level of matching per partial subgraph pattern.

Application Number

15/594,376

Inventors

Agarwal,Khushbu
Choudhury,Sutanay
Beus,Sherman J
Chin Jr,George

Market Sector

Data Sciences