January 28, 2025
Conference Paper
Labeling sequential data from noisy annotations
Abstract
Crowdsourcing algorithms often work under the assumption that the data samples are independent. Recent work has shown that data dependence, such as temporal correlations in sequential data, can be leveraged to improve the label quality. Existing methods that exploit this special structure rely on third-order statistics of the annotator outputs to ensure the identifiability of key latent parameters, which are costly to acquire. This work proposes an approach for integrating crowdsourced annotations under the Dawid-Skene/Hidden Markov Model (DS-HMM) for sequential data based on second-order statistics, which naturally enjoys a lower sample complexity. An effective algorithm is proposed to tackle the challenging optimization problem associated with the proposed estimator. Numerical experiments showcase the effectiveness of the data labeling paradigm.Published: January 28, 2025