Automated variable weighting in k-means type clustering

被引:576
作者
Huang, JZX
Ng, MK
Rong, HQ
Li, ZC
机构
[1] Univ Hong Kong, E Business Technol Inst, Hong Kong, Hong Kong, Peoples R China
[2] Univ Hong Kong, Dept Math, Hong Kong, Hong Kong, Peoples R China
[3] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[4] Henan Polytech Univ, Dept Comp Sci & Technol, Jiaozuo City 454003, Henan Province, Peoples R China
基金
中国国家自然科学基金;
关键词
clustering; data mining; mining methods and algorithms; feature evaluation and selection;
D O I
10.1109/TPAMI.2005.95
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k-means type algorithms in recovering clusters in data.
引用
收藏
页码:657 / 668
页数:12
相关论文
共 21 条
[1]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[2]  
[Anonymous], 1996, Monte Carlo Concepts, Algorithms and Applications
[3]  
[Anonymous], P 5 BERK S MATH STAT
[4]  
Bezdek J., 1980, IEEE T PATTERN ANAL, V1, P1
[5]   SYNTHESIZED CLUSTERING - A METHOD FOR AMALGAMATING ALTERNATIVE CLUSTERING BASES WITH DIFFERENTIAL WEIGHTING OF VARIABLES [J].
DESARBO, WS ;
CARROLL, JD ;
CLARK, LA ;
GREEN, PE .
PSYCHOMETRIKA, 1984, 49 (01) :57-78
[6]   OPTIMAL VARIABLE WEIGHTING FOR ULTRAMETRIC AND ADDITIVE TREE CLUSTERING [J].
DESOETE, G .
QUALITY & QUANTITY, 1986, 20 (2-3) :169-180
[8]   VARIABLE SELECTION IN CLUSTERING [J].
FOWLKES, EB ;
GNANADESIKAN, R ;
KETTENRING, JR .
JOURNAL OF CLASSIFICATION, 1988, 5 (02) :205-228
[9]  
Friedman J. H., 2002, J ROYAL STAT SOC B
[10]   WEIGHTING AND SELECTION OF VARIABLES FOR CLUSTER-ANALYSIS [J].
GNANADESIKAN, R ;
KETTENRING, JR ;
TSAO, SL .
JOURNAL OF CLASSIFICATION, 1995, 12 (01) :113-136