Due to globalization, geographic boundaries
no longer serve as effective shields for the
spread of infectious diseases. In order to aid
bio-surveillance analysts in disease tracking,
recent research has been devoted to developing
information retrieval and analysis methods
utilizing the vast corpora of publicly available
documents on the internet. In this work,
we present methods for the automated retrieval
and classification of documents related to active
public health events. We demonstrate classification
performance on an auto-generated
corpus, using recurrent neural network, TFIDF,
and Naive Bayes log count ratio document
representations. By jointly modeling the
title and description of a document, we achieve
97% recall and 93.3% accuracy with our best
performing bio-surveillance event classification
model; logistic regression on the combined
output from a pair of bidirectional recurrent
neural networks.
Published: July 23, 2021
Citation
Tuor A.R., L.E. Charles, and F. Anubhav. 2018.Multiple Document Representations from News Alerts for Automated Bio-surveillance Event Detection. In 5th Pacific Northwest Regional NLP Workshop: NW-NLP 2018, April 27, 2018, Redmond, WA, Paper No. arXiv:1902.06231. Stroudsburg, Pennsylvania:Association for Computational Linguistics.PNNL-SA-131548.doi:10.48550/arXiv.1902.06231