The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. The Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. The paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by conducting iteratively the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model in order to choose carefully the cover based on the distribution of a given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets.
Published: August 20, 2025
Citation
Alvarado E., R. Belton, E. Fischer, K. Lee, S. Palande, S. Percival, and E. Purvine. 2025.G-Mapper: Learning a Cover in the Mapper Construction.SIAM Journal on Mathematics of Data Science 7, no. 2:572-596.PNNL-SA-189983.doi:10.1137/24M1641312