Technology Overview
One issue in information analysis is the scarcity of time and/or resources to review large volumes of information, which is often impossible using traditional approaches, such as lists, tables, and simple graphs. Many traditional text analysis techniques focus on selecting features that distinguish documents within a document group. However, these techniques may fail to select features that characterize or describe the majority or a minor subset of documents within a group. Also, when the information is streaming and/or updated over time, the group is dynamic and can change significantly. Most current tools are limited in that they only allow information consumers to interact with snapshots of an information space that is often continually changing. Tools are needed to help automatically identify and/or understand the themes, topics, and/or trends within these large volumes of information.
To meet this need, PNNL scientists have developed a method for selecting features and measuring association between arbitrary pairs of features based on their suitability as predictors for themselves and for each other, respectively. The computation of predictive ability leverages a) automatic feature extraction algorithms, such as the Rapid Automatic Keyword Extraction (RAKE) algorithm, which identifies expressed features within individual objects; and b) search functions that identify all objects in a collection in which an arbitrary feature occurs. The PNNL-developed method describes how the feature-object information generated by feature extraction and search functions can be combined to measure the predictive ability of features for themselves, and for each other, thereby improving analytic capabilities that rely on insight to features and feature associations within object collections.
Advantages
- Improves analytic capabilities in features and associations within information collections
- Provides ability to identify topics and/or trends within large volumes of information