Text streams, collections of documents or messages that are generated and observed over time, are ubiquitous. Our research and development is targeted at developing algorithms to find and characterize changes in topic within text streams. To date, this research has demonstrated the ability to detect and describe 1) short duration, atypical events and 2) the emergence of longer term shifts in topical content. This technology has been applied to pre-defined temporally ordered document collections but is also suitable for application to near real-time textual data streams. The underlying event and emergence detection algorithms have been interfaced to an event detection software user interface named SURPRISE. This software provides an interactive graphical user interface and tools for manipulating and correlating the terms and scores identified by the algorithms. Additionally, SURPRISE has been interfaced with the IN-SPIRE text analytics tool to enable an analyst to evaluate the surprising or emerging terms via a visualization of the entire document collection. IN-SPIRE assists in the exploration of related topics, events and views currently based on single term events. The focus of this research is to contribute to detecting, and preventing, strategic surprise.
Revised: June 23, 2010 |
Published: February 10, 2009
Citation
Engel D.W., P.D. Whitney, A.J. Calapristi, and F.J. Brockman. 2009.Mining for Emerging Technologies within Text Streams and Documents. In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009: Proceedings in Applied Mathematics, 3, 1291-1301. Philadelphia, Pennsylvania:SIAM.PNNL-SA-64618.