A distributed framework for trimmed Kernel k-Means clustering

被引:36
作者
Tsapanos, Nikolaos [1 ]
Tefas, Anastasios [1 ]
Nikolaidis, Nikolaos [1 ]
Pitas, Ioannis [1 ]
机构
[1] Aristotle Univ Thessaloniki, GR-54006 Thessaloniki, Greece
关键词
Data clustering; Face clustering; Kernel k-Means; Distributed computing;
D O I
10.1016/j.patcog.2015.02.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. Kernel k-Means is a state of the art clustering algorithm. However, in contrast to clustering algorithms that can work using only a limited percentage of the data at a time, Kernel k-Means is a global clustering algorithm. It requires the computation of the kernel matrix, which takes O(n(2)d) time and O(n(2)) space in memory. As datasets grow larger, the application of Kernel k-Means becomes infeasible on a single computer, a fact that strongly suggests a distributed approach. In this paper, we present such an approach to the Kernel k-Means clustering algorithm, in order to make its application to a large number of samples feasible and, thus, achieve high performance clustering results on very big datasets. Our distributed approach follows the MapReduce programming model and consists of 3 stages, the kernel matrix computation, a novel kernel matrix trimming method and the Kernel k-Means clustering algorithm. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2685 / 2698
页数:14
相关论文
共 44 条
[11]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[12]  
Dhillon I.S., 2004, A unified view of kernel k-means, spectral clustering and graph cuts
[13]  
Dhillon I. S., 2004, P 10 ACM SIGKDD INT, P551, DOI DOI 10.1145/1014052.1014118
[14]  
Dhillon IS, 2007, IEEE T PATTERN ANAL, V29, P1944, DOI 10.1109/TP'AMI.2007.1115
[15]  
Ene A., 2011, P 17 ACM KDD, P681, DOI DOI 10.1145/2020408.2020515
[16]  
Everingham Mark, 2006, BMVC, DOI DOI 10.5244/C.20.92
[17]  
Fanti C, 2004, ADV NEUR IN, V16, P1603
[18]  
Ferreira Cordeiro RobsonLeonardo., 2011, P 17 ACM SPECIAL INT, P690, DOI DOI 10.1145/2020408.2020516
[19]   Kernel-based hard clustering methods in the feature space with automatic variable weighting [J].
Ferreira, Marcelo R. P. ;
de Carvalho, Francisco de A. T. .
PATTERN RECOGNITION, 2014, 47 (09) :3082-3095
[20]   A survey of kernel and spectral methods for clustering [J].
Filippone, Maurizio ;
Camastra, Francesco ;
Masulli, Francesco ;
Rovetta, Stefano .
PATTERN RECOGNITION, 2008, 41 (01) :176-190