June 19, 2013
Conference Paper

Massive Scale Cyber Traffic Analysis: A Driver for Graph Database Research

Abstract

We describe the significance and prominence of network traffic analysis (TA) as a graph- and network-theoretical domain for advancing research in graph database systems. TA involves observing and analyzing the connections between clients, servers, hosts, and actors within IP networks, both at particular times and as extended over times. Towards that end, NetFlow (or more generically, IPFLOW) data are available from routers and servers which summarize coherent groups of IP packets flowing through the network. IPFLOW databases are routinely interrogated statistically and visualized for suspicious patterns. But the ability to cast IPFLOW data as a massive graph and query it interactively, in order to e.g.\ identify connectivity patterns, is less well advanced, due to a number of factors including scaling, and their hybrid nature combining graph connectivity and quantitative attributes. In this paper, we outline requirements and opportunities for graph-structured IPFLOW analytics based on our experience with real IPFLOW databases. Specifically, we describe real use cases from the security domain, cast them as graph patterns, show how to express them in two graph-oriented query languages SPARQL and Datalog, and use these examples to motivate a new class of "hybrid" graph-relational systems.

Revised: July 24, 2013 | Published: June 19, 2013

Citation

Joslyn C.A., S. Choudhury, D.J. Haglin, B. Howe, W.K. Nickless, and B.K. Olsen. 2013. Massive Scale Cyber Traffic Analysis: A Driver for Graph Database Research. In FIrst International Workshop on Graph Data Management Experiences and Systems (GRADES 2013), held in conjunction with the ACM SIGMOD/PODS Conference, June 22-27, 2013, New York, Article No. 3. New York, New York:ACM. PNNL-SA-94818. doi:10.1145/2484425.2484428