January 28, 2025
Conference Paper

Labeling sequential data from noisy annotations

Abstract

Crowdsourcing algorithms often work under the assumption that the data samples are independent. Recent work has shown that data dependence, such as temporal correlations in sequential data, can be leveraged to improve the label quality. Existing methods that exploit this special structure rely on third-order statistics of the annotator outputs to ensure the identifiability of key latent parameters, which are costly to acquire. This work proposes an approach for integrating crowdsourced annotations under the Dawid-Skene/Hidden Markov Model (DS-HMM) for sequential data based on second-order statistics, which naturally enjoys a lower sample complexity. An effective algorithm is proposed to tackle the challenging optimization problem associated with the proposed estimator. Numerical experiments showcase the effectiveness of the data labeling paradigm.

Published: January 28, 2025

Citation

Marrinan T.P., S. Ibrahim, and X. Fu. 2024. Labeling sequential data from noisy annotations. In IEEE 13rd Sensor Array and Multichannel Signal Processing Workshop (SAM 2024), July 8-11. 2024, Corvallis, OR, 1-5. Piscataway, New Jersey:IEEE. PNNL-SA-200058. doi:10.1109/SAM60225.2024.10636383

Research topics