Improving Scalable K-Means++

https://jyx.jyu.fi/handle/123456789/73628

http://www.urn.fi/URN:NBN:fi:jyu-202101141104

hamalainenym0701.pdf (Jyväskylän yliopisto - JYX)

Artikkeli

Hämäläinen, Joonas ; Kärkkäinen, Tommi ; Rossi, Tuomo ; Informaatioteknologian tiedekunta ; Faculty of Information Technology

MDPI AG 2021

Näytä muut versiot (1)

Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases

Tallennettuna:

Kieli

englanti

Sarja

Algorithms, 1

Aiheet

clustering initialization