Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

被引：110

作者：

Tsai, Chieh-Yuan ^{[1
]}

Chiu, Chuang-Cheng ^{[1
]}

机构：

[1] Yuan Ze Univ, Dept Ind Engn & Management, Chungli 320, Taoyuan County, Taiwan

来源：

COMPUTATIONAL STATISTICS & DATA ANALYSIS | 2008年 / 52卷 / 10期

关键词：

D O I：

10.1016/j.csda.2008.03.002

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

K-means is one of the most popular and widespread partitioning clustering algorithms due to its superior scalability and efficiency. Typically, the K-means algorithm treats all features fairly and sets weights of all features equally when evaluating dissimilarity. However, a meaningful clustering phenomenon often occurs in a subspace defined by a specific subset of all features. To address this issue, this paper proposes a novel feature weight self-adjustment (FWSA) mechanism embedded into K-means in order to improve the clustering quality of K-means. In the FWSA mechanism, finding feature weights is modeled as an optimization problem to simultaneously minimize the separations within clusters and maximize the separations between clusters. With this objective, the adjustment margin of a feature weight can be derived based on the importance of the feature to the clustering quality. At each iteration in K-means, all feature weights are adaptively updated by adding their respective adjustment margins. A number of synthetic and real data are experimented on to show the benefits of the proposed FWAS mechanism. In addition, when compared to a recent similar feature weighting work, the proposed mechanism illustrates several advantages in both the theoretical and experimental results. (C) 2008 Elsevier B.V. All rights reserved.

引用

页码：4658 / 4672

页数：15

共 45 条

[1] ALPAYDIN E, 2004, INTRO MACHINE LEARNI, P133
[2] [Anonymous], 1975, CLUSTERING ALGORITHM
[3] [Anonymous], [No title captured], DOI DOI 10.1145/347090.347169
[4] [Anonymous], 2005, Introduction to data mining
[5] TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE
BLUM, AL
RIVEST, RL
[J]. NEURAL NETWORKS, 1992, 5 (01) : 117 - 127
[6] A variable-selection heuristic for K-means clustering
Brusco, MJ
Cradit, JD
[J]. PSYCHOMETRIKA, 2001, 66 (02) : 249 - 270
[7] An optimization algorithm for clustering using weighted dissimilarity measures
Chan, EY
Ching, WK
Ng, MK
Huang, JZ
[J]. PATTERN RECOGNITION, 2004, 37 (05) : 943 - 952
[8] Feature selection for clustering - A filter solution
Dash, M
Choi, K
Scheuermann, P
Liu, H
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 115 - 122
[9] SYNTHESIZED CLUSTERING - A METHOD FOR AMALGAMATING ALTERNATIVE CLUSTERING BASES WITH DIFFERENTIAL WEIGHTING OF VARIABLES
DESARBO, WS
CARROLL, JD
CLARK, LA
GREEN, PE
[J]. PSYCHOMETRIKA, 1984, 49 (01) : 57 - 78
[10] OPTIMAL VARIABLE WEIGHTING FOR ULTRAMETRIC AND ADDITIVE TREE CLUSTERING
DESOETE, G
[J]. QUALITY & QUANTITY, 1986, 20 (2-3) : 169 - 180

← 1 2 3 4 5 →