In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varying data sizes.
Revised: December 2, 2016 |
Published: September 22, 2016
Citation
Mackey P.S., and R.R. Lewis. 2016.Parallel k-means++ for Multiple Shared-Memory Architectures. In 45th International Conference on Parallel Processing (ICPP 2016), August 16-19, 2016, Philadelphia, PA, 93-102. Piscataway, New Jersey:IEEE.PNNL-SA-116273.doi:10.1109/ICPP.2016.18