Graphs represent an increasingly popular data
model for data-analytics, since they can naturally represent relationships and interactions between entities. Relational databases
and their pure table-based data model are not well suitable to
store and process sparse data. Consequently, graph databases
have gained interest in the last few years and the Resource
Description Framework (RDF) became the standard data model
for graph data. Nevertheless, while RDF is well suited to analyze
the relationships between the entities, it is not efficient in
representing their attributes and properties. In this work we
propose the adoption of a new hybrid data model, based on
attributed graphs, that aims at overcoming the limitations of
the pure relational and graph data models. We present how we
have re-designed the GEMS data-analytics framework to fully
take advantage of the proposed hybrid data model. To improve
analysts productivity, in addition to a C++ API for applications
development, we adopt GraQL as input query language. We
validate our approach implementing a set of queries on net-flow
data and we compare our framework performance against Neo4j.
Experimental results show significant performance improvement
over Neo4j, up to several orders of magnitude when increasing
the size of the input data.
Revised: April 23, 2019 |
Published: May 29, 2017
Citation
Castellana V.G., M. Minutoli, S. Bhatt, K. Agarwal, J.T. Feo, D.G. Chavarria Miranda, and D.J. Haglin. 2017.High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2017), May 29-June 2, 2017, Orlando, Florida, 1029-1038. Piscataway, New Jersey:IEEE.PNNL-SA-124655.doi:10.1109/IPDPSW.2017.70