Custom Accessors: Enabling Scalable Data Ingestion, (Re-)Organization, and Analysis on Distributed Systems

May 15, 2025

Conference Paper

Custom Accessors: Enabling Scalable Data Ingestion, (Re-)Organization, and Analysis on Distributed Systems

Abstract

The emerging class of high velocity and high volume data analytic workflows comprise interwoven data ingestion, organization, and processing stages, with ingestion and organization steps often contributing comparable or even higher computational costs than actual processing steps. Since complex workflows consist of a variety of phases that view and use data differently, being able to construct efficient, scalable, distributed data structures (arrays, vectors, sets, maps, and multi-maps) is essential and requires custom methods to extend and shrink containers, analyze and position data, and, maintain globallyconsistent meta-data. In this paper, we propose a novel datastructure access paradigm based on the concept of Accessors. At a high level, accessors are customizable callable objects that can modify the behavior of insert, read, update, and delete operations for distributed containers while preserving atomicity guarantees. Accessors provide a very clean and natural way to implement a variety of programming patterns, e.g., conditional insertion/deletion and cascading computations, which would be otherwise hard (or even impossible) to express in parallel and distributed settings without using locks. We demonstrate the practicality and usefulness of our approach with two representative use cases and study the performance of these applications on a distributed High-Performance Computing system. Our analysis highlights that our proposed abstraction allows for an effective overlapping and concurrent execution of different workflow steps (e.g., data ingestion and analysis), which in a conventional analytics pipeline would execute sequentially, contributing cumulatively to the overall latency.

Published: May 15, 2025

Citation

Castellana V.G., B. Mutlu, I. Di Dio Lavore, J.S. Firoz, K.E. Wolf, M. Minutoli, and J.T. Feo. 2024. Custom Accessors: Enabling Scalable Data Ingestion, (Re-)Organization, and Analysis on Distributed Systems. In IEEE International Conference on Big Data (BigData 2024), December 15-18, 2024, Washington, D.C., 189-198. Piscataway, New Jersey:IEEE. PNNL-SA-205896. doi:10.1109/BigData62323.2024.10825020

Research topics

High-Performance Computing

PNNL

Custom Accessors: Enabling Scalable Data Ingestion, (Re-)Organization, and Analysis on Distributed Systems

Abstract

Citation

Research topics

DS-TPU: Dynamical System for on-Device Lifelong Graph Learning with Nonlinear Node Interaction

SPARTA: High-Level Synthesis of Parallel Multi-Threaded Accelerators

Programming the Future: the Essential Role of System Topology Awareness in Heterogeneous Disaggregated Environments