Multiple Document Representations from News Alerts for Automated Bio-surveillance Event Detection

July 23, 2021

Conference Paper

Multiple Document Representations from News Alerts for Automated Bio-surveillance Event Detection

Abstract

Due to globalization, geographic boundaries no longer serve as effective shields for the spread of infectious diseases. In order to aid bio-surveillance analysts in disease tracking, recent research has been devoted to developing information retrieval and analysis methods utilizing the vast corpora of publicly available documents on the internet. In this work, we present methods for the automated retrieval and classification of documents related to active public health events. We demonstrate classification performance on an auto-generated corpus, using recurrent neural network, TFIDF, and Naive Bayes log count ratio document representations. By jointly modeling the title and description of a document, we achieve 97% recall and 93.3% accuracy with our best performing bio-surveillance event classification model; logistic regression on the combined output from a pair of bidirectional recurrent neural networks.

Published: July 23, 2021

Citation

Tuor A.R., L.E. Charles, and F. Anubhav. 2018. Multiple Document Representations from News Alerts for Automated Bio-surveillance Event Detection. In 5th Pacific Northwest Regional NLP Workshop: NW-NLP 2018, April 27, 2018, Redmond, WA, Paper No. arXiv:1902.06231. Stroudsburg, Pennsylvania:Association for Computational Linguistics. PNNL-SA-131548. doi:10.48550/arXiv.1902.06231