A novel fuzzy clustering algorithm with between-cluster information for categorical data

被引:35
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat, Minist Educ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Fuzzy clustering; The fuzzy k-modes algorithm; Optimization objective function; Categorical data; VALIDITY;
D O I
10.1016/j.fss.2012.06.005
中图分类号
TP301 [理论、方法];
学科分类号
080201 [机械制造及其自动化];
摘要
In this paper, we present a new fuzzy clustering algorithm for categorical data. In the algorithm, the objective function of the fuzzy k-modes algorithm is modified by adding the between-cluster information so that we can simultaneously minimize the within-cluster dispersion and enhance the between-cluster separation. For obtaining the local optimal solutions of the modified objective function, the corresponding update formulas of the membership matrix and the cluster prototypes are strictly derived. The convergence of the proposed algorithm under the optimization framework is proved. On several real data sets from UCI, the performance of the proposed algorithm is studied. The experimental results illustrate that the algorithm is effective and suitable for categorical data sets. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:55 / 73
页数:19
相关论文
共 36 条
[1]
Finding localized associations in market basket data [J].
Aggarwal, CC ;
Procopiuc, C ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) :51-62
[2]
[Anonymous], 1999, P 5 ACM SIGKDD INT C
[3]
[Anonymous], 1988, Algorithms for Clustering Data
[4]
A novel attribute weighting algorithm for clustering high-dimensional categorical data [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin ;
Cao, Fuyuan .
PATTERN RECOGNITION, 2011, 44 (12) :2843-2861
[5]
An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin .
KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) :785-795
[6]
Barbara D., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P582, DOI 10.1145/584792.584888
[7]
BARBARA D., 2002, Applications of Data Mining in Computer Security
[8]
Baxevanis A.D., 2001, BIOINFORMATICS PRACT
[9]
BEZDEK JC, 1976, IEEE T SYST MAN CYB, V6, P387
[10]
EFFICIENT IMPLEMENTATION OF THE FUZZY C-MEANS CLUSTERING ALGORITHMS [J].
CANNON, RL ;
DAVE, JV ;
BEZDEK, JC .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1986, 8 (02) :248-255