Economic competitiveness and national security depend
increasingly on the insightful analysis of large data sets.
The diversity of real-world data sources and analytic workflows
impose challenging hardware and software requirements for
parallel graph platforms. The irregular nature of graph methods
is not supported well by the deep memory hierarchies of
conventional distributed systems, requiring new processor and
runtime system designs to tolerate memory and synchronization
latencies. Moreover, the efficiency of relational table operations
and matrix computations are not attainable when data is stored
in common graph data structures. In this paper, we present HAGGLE,
a high-performance, scalable data analytics platform. The
platform’s hybrid data model supports a variety of distributed,
thread-safe data structures, parallel programming constructs,
and persistent and streaming data. An abstract runtime layer
enables us to map the stack to conventional, distributed computer
systems with accelerators. The runtime uses multithreading,
active messages, and data aggregation to hide memory and
synchronization latencies on large-scale systems.
Revised: January 16, 2020 |
Published: March 25, 2019
Citation
Castellana V.G., M. Drocco, J.T. Feo, J. Firoz, T.A. Kanewala, A. Lumsdaine, and J.B. Manzano Franco, et al. 2019.A Parallel Graph Environment for Real-World Data Analytics Workflows. In Design, Automation & Test in Europe Conference & Exhibition (DATE 2019), March 25-29, 2019, Florence, Italy, 1313-1318. Piscataway, New Jersey:IEEE.PNNL-SA-140268.doi:10.23919/DATE.2019.8715196