October 12, 2009
Conference Paper

Two-stage Framework for Visualization of Clustered High Dimensional Data

Abstract

In this paper, we discuss 2D visualization methods of high dimensional representation of the data that are clustered and their associated label information is available. We propose a two-stage framework for visualization of such data based on dimension reduction methods. In the first stage, we obtain the reduced dimensional data by a supervised dimension reduction method such as linear discriminant analysis that preserves the original cluster structure in terms of its criterion. The resulting optimal reduced dimension depends on the optimization criteria and is often larger than 2. In the second stage, in order to further reduce the dimension to 2 for visualization purposes, we apply another dimension reduction method such as principal component analysis that minimizes the distortion in the lower dimensional representation of the data obtained in the first stage. Using this framework, we propose several two-stage methods, and present their theoretical characteristics as well as experimental comparisons on both artificial and real-world text data sets.

Revised: September 21, 2010 | Published: October 12, 2009

Citation

Choo J., S.J. Bohn, and H. Park. 2009. Two-stage Framework for Visualization of Clustered High Dimensional Data. In IEEE Symposium on Visual Analytics Science and Technology (VAST 2009), 67-74. Piscataway, New Jersey:IEEE. PNNL-SA-65520. doi:10.1109/VAST.2009.5332629