Patch clustering for massive data sets

被引:18
作者
Alex, Nikolai [2 ]
Hasenfuss, Alexander [1 ]
Hammer, Barbara [1 ]
机构
[1] Tech Univ Clausthal, Dept Informat, D-38678 Clausthal Zellerfeld, Germany
[2] Univ Appl Sci Braunschweig Wolfenbuttel, Dept Comp Sci, D-38302 Wolfenbuttel, Germany
关键词
Neural gas; Clustering streaming data; Parallelization; SELF-ORGANIZING MAPS; VECTOR QUANTIZATION; ALGORITHM;
D O I
10.1016/j.neucom.2008.12.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of huge data sets poses new problems to popular clustering and visualization algorithms such as neural gas (NG) and the self-organising-map (SOM) due to memory and time constraints. In such situations, it is no longer possible to store all data points in the main memory at once and only a few, ideally only one run over the whole data set is still affordable to achieve a feasible training time. In this contribution we propose single pass extensions of the classical clustering algorithms NG and SOM which are based on a simple patch decomposition of the data set and fast batch optimization schemes of the underlying cost function. The algorithms only require a fixed memory space. They maintain the benefits of the original ones including easy implementation and interpretation as well as large flexibility and adaptability. We demonstrate that parallelization of the methods becomes easily possible and we show the efficiency of the approach in a variety of experiments. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1455 / 1469
页数:15
相关论文
共 40 条
[1]  
ANCONA IF, 1996, PARALLEL APPROACH PL
[2]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theory of NP-Completeness
[3]  
[Anonymous], 2005, PASCAL WORKSH STAT O
[4]   A framework for statistical clustering with a constant time approximation algorithms for K-median clustering [J].
Ben-David, S .
LEARNING THEORY, PROCEEDINGS, 2004, 3120 :415-426
[5]  
BENDAVID S, 2005, NOTION STABILITY STA
[6]  
Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585
[7]   Stability and generalization [J].
Bousquet, O ;
Elisseeff, A .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) :499-526
[8]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[9]  
Carpenter G. A., 2003, HDB BRAIN THEORY NEU, P87, DOI DOI 10.1016/J.NEUNET.2012.09.017]
[10]   Batch and median neural gas [J].
Cottrell, Marie ;
Hammer, Barbara ;
Hasenfuss, Alexander ;
Villmann, Thomas .
NEURAL NETWORKS, 2006, 19 (6-7) :762-771