Sample-weighted clustering methods

被引:24
作者
Yu, Jian [2 ]
Yang, Miin-Shen [1 ]
Lee, E. Stanley [3 ]
机构
[1] Chung Yuan Christian Univ, Dept Appl Math, Chungli 32023, Taiwan
[2] Beijing Jiaotong Univ, Dept Comp Sci, Beijing 100044, Peoples R China
[3] Kansas State Univ, Dept Ind & Mfg Syst Engn, Manhattan, KS 66506 USA
关键词
Cluster analysis; Maximum entropy principle; k-means; Fuzzy c-means; Sample weights; Robustness; FUZZY C-MEANS; CONVERGENCE PROPERTIES; MEAN SHIFT; ALGORITHM; QUANTIZATION; SELECTION;
D O I
10.1016/j.camwa.2011.07.005
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Although there have been many researches on cluster analysis considering feature (or variable) weights, little effort has been made regarding sample weights in clustering. In practice, not every sample in a data set has the same importance in cluster analysis. Therefore, it is interesting to obtain the proper sample weights for clustering a data set. In this paper, we consider a probability distribution over a data set to represent its sample weights. We then apply the maximum entropy principle to automatically compute these sample weights for clustering. Such method can generate the sample-weighted versions of most clustering algorithms, such as k-means, fuzzy c-means (FCM) and expectation & maximization (EM), etc. The proposed sample-weighted clustering algorithms will be robust for data sets with noise and outliers. Furthermore, we also analyze the convergence properties of the proposed algorithms. This study also uses some numerical data and real data sets for demonstration and comparison. Experimental results and comparisons actually demonstrate that the proposed sample-weighted clustering algorithms are effective and robust clustering methods. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2200 / 2208
页数:9
相关论文
共 34 条
[1]  
Anderson Edgar, 1935, Bulletin of the American Iris Society, V59, P2
[2]  
Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms
[3]   CONVERGENCE THEORY FOR FUZZY C-MEANS - COUNTEREXAMPLES AND REPAIRS [J].
BEZDEK, JC ;
HATHAWAY, RJ ;
SABIN, MJ ;
TUCKER, WT .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1987, 17 (05) :873-877
[4]   A unifying criterion for unsupervised clustering and feature selection [J].
Breaban, Mihaela ;
Luchian, Henri .
PATTERN RECOGNITION, 2011, 44 (04) :854-865
[5]   Fast color quantization using weighted sort-means clustering [J].
Celebi, M. Emre .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2009, 26 (11) :2434-2443
[6]   MEAN SHIFT, MODE SEEKING, AND CLUSTERING [J].
CHENG, YZ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) :790-799
[7]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[8]  
GENTILE C, 2000, P TUT 13 INT C COMP
[9]   Automated variable weighting in k-means type clustering [J].
Huang, JZX ;
Ng, MK ;
Rong, HQ ;
Li, ZC .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (05) :657-668
[10]   Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation [J].
Hung, Wen-Liang ;
Yang, Miin-Shen ;
Chen, De-Hua .
PATTERN RECOGNITION LETTERS, 2008, 29 (09) :1317-1325