We present a new version of sequential projection pursuit Principal Component Analysis (sppPCA) that has the capability to perform PCA on large multivariate datasets that contain non-random missing values. We demonstrate that sppPCA generates more robust and informative low-dimensional representations of the data than imputation-based approaches and improved downstream statistical analyses, such as clustering or classification. A Java program to run sppPCA is freely available at https://www.biopilot.org/docs/Software/sppPCA.
Revised: August 16, 2018 |
Published: March 15, 2013
Citation
Webb-Robertson B.M., M.M. Matzke, T.O. Metz, J.E. McDermott, J. Walker, K.D. Rodland, and J.G. Pounds, et al. 2013.Sequential Projection Pursuit Principal Component Analysis – Dealing with Missing Data Associated with New -Omics Technologies.BioTechniques 54, no. 3:165-168.PNNL-SA-87092.