Scalable Visual Analytics of Massive Textual Datasets

April 1, 2007

Conference Paper

Scalable Visual Analytics of Massive Textual Datasets

Abstract

This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.

Revised: June 15, 2007 | Published: April 1, 2007

Citation

Krishnan M., S.J. Bohn, W.E. Cowley, V.L. Crow, and J. Nieplocha. 2007. Scalable Visual Analytics of Massive Textual Datasets. In IPDPS 2007. IEEE International Parallel and Distributed Processing Symposium, 26-30 March 2007, Long Beach, CA, USA, 10 pages. Piscataway, New Jersey:Institution of Electrical and Electronics Engineers. PNNL-SA-52302. doi:10.1109/IPDPS.2007.370232