May 29, 2017
Conference Paper

High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS

Abstract

Graphs represent an increasingly popular data model for data-analytics, since they can naturally represent relationships and interactions between entities. Relational databases and their pure table-based data model are not well suitable to store and process sparse data. Consequently, graph databases have gained interest in the last few years and the Resource Description Framework (RDF) became the standard data model for graph data. Nevertheless, while RDF is well suited to analyze the relationships between the entities, it is not efficient in representing their attributes and properties. In this work we propose the adoption of a new hybrid data model, based on attributed graphs, that aims at overcoming the limitations of the pure relational and graph data models. We present how we have re-designed the GEMS data-analytics framework to fully take advantage of the proposed hybrid data model. To improve analysts productivity, in addition to a C++ API for applications development, we adopt GraQL as input query language. We validate our approach implementing a set of queries on net-flow data and we compare our framework performance against Neo4j. Experimental results show significant performance improvement over Neo4j, up to several orders of magnitude when increasing the size of the input data.

Revised: April 23, 2019 | Published: May 29, 2017

Citation

Castellana V.G., M. Minutoli, S. Bhatt, K. Agarwal, J.T. Feo, D.G. Chavarria Miranda, and D.J. Haglin. 2017. High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2017), May 29-June 2, 2017, Orlando, Florida, 1029-1038. Piscataway, New Jersey:IEEE. PNNL-SA-124655. doi:10.1109/IPDPSW.2017.70