Reference datasets are an essential part of developing effective data exploration tools. Many reference datasets are synthetic in nature, seeding real-world properties into generation routines. Others are laboriously curated from original source materials. This is a time-consuming process that requires significant domain expertise all along the process. This paper presents a third path: deriving a network from real-world source materials using automated tools. By properly grounding the tools in a small collection of curated documents, a larger corpus be automatically processed with greater confidence. The end system leverages human knowledge (in the initial document curation), but with much less time commitment enabled by automation. Our ultimate goal is to produce reference datasets, derived from real-world data that capture aspects salient to a specific activity in the context of all of the activities in the data-set. Our reference datasets take the form of graphs.
Revised: March 17, 2020 |
Published: December 30, 2019
Citation
Chandra Shekar M., and J.A. Cottam. 2019.Graph Generation with a Focusing Lexicon. In IEEE International Conference on Big Data (Big Data 2019), December 9-12, 2019, Los Angeles, CA, 4928-4931. Piscataway, New Jersey:IEEE.PNNL-SA-148002.doi:10.1109/BigData47090.2019.9006568