The Threat Stream Generator (TSG) project at Pacific Northwest National Laboratory has been developing synthetic datasets to test and evaluate visual analytics tools for the past four years. Our activities have ranged from supporting the evaluation of major U.S. Government analytical frameworks to creating four datasets for the IEEE Visual Analytics Science and Technology (VAST) contest over the past two years. We have developed a reasonable method and supporting toolset for creating believable synthetic data sets for different uses. A key differentiator for our datasets is that they contain data concerning one or more invented threats, based on a scenario. Embedding a known threat into the data provides ground truth for analytic tools to work against in evaluating their performance, as well as new opportunities for evaluation researchers to explore techniques given ground truth exists. We describe the process of creating the scenarios and threats and the process of transforming them into data elements, and then we describe how this data is embedded in other data to form a TSG dataset.
Revised: August 18, 2009 |
Published: April 10, 2008
Citation
Whiting M.A., J.N. Haack, and C.F. Varley. 2008.Creating realistic, scenario-based synthetic data for test and evaluation of information analytics software. In Proceedings of the 2008 conference on BEyond time and errors: novel evaLuation methods for Information Visualization (BELIV 2008), April 5, 2008, Florence, Italy., Article No. 8. New York, New York:Association for Computing Machinery.PNNL-SA-58345.