A hybrid clustering technique combining a novel genetic algorithm with K-Means

被引：121

作者：

Rahman, Md Anisur ^{[1
]}

Islam, Md Zahidul ^{[1
]}

机构：

[1] Charles Sturt Univ, Ctr Res Complex Syst, Sch Comp & Math, Bathurst, NSW 2795, Australia

来源：

KNOWLEDGE-BASED SYSTEMS | 2014年 / 71卷

关键词：

Clustering; Genetic algorithm; K-Means; Cluster evaluation; Data mining; PARTICLE SWARM OPTIMIZATION; VALIDITY INDEXES; EXPRESSION DATA; CLASSIFICATION;

D O I：

10.1016/j.knosys.2014.08.011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many existing clustering techniques including K-Means require a user input on the number of clusters. It is often extremely difficult for a user to accurately estimate the number of clusters in a data set. The genetic algorithms (GAs) generally determine the number of clusters automatically. However, they typically choose the genes and the number of genes randomly. If we can identify the right genes in the initial population then GAs have better possibility to produce a high quality clustering result than the case when we randomly choose the genes. We propose a novel GA based clustering technique that is capable of automatically finding the right number of clusters and identifying the right genes through a novel initial population selection approach. With the help of our novel fitness function, and gene rearrangement operation it produces high quality cluster centers. The centers are then fed into K-Means as initial seeds in order to produce an even higher quality clustering solution by allowing the initial seeds to readjust as needed. Our experimental results indicate a statistically significant superiority (according to the sign test analysis) of our technique over five recent techniques on twenty natural data sets used in this study based on six evaluation criteria. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：345 / 365

页数：21

共 46 条

[1] A k-mean clustering algorithm for mixed numeric and categorical data [J].

Ahmad, Amir ;

Dey, Lipika .

DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) :503-527

[2]

Alam S., 2014, SWARM EVOL COMPUT

[3]

[Anonymous], 2011, Pei. data mining concepts and techniques

[4] Modified global k-means algorithm for minimum sum-of-squares clustering problems [J].

Bagirov, Adil M. .

PATTERN RECOGNITION, 2008, 41 (10) :3192-3199

[5] Fast modified global k-means algorithm for incremental cluster construction [J].

Bagirov, Adil M. ;

Ugon, Julien ;

Webb, Dean .

PATTERN RECOGNITION, 2011, 44 (04) :866-876

[6] An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data [J].

Bai, Liang ;

Liang, Jiye ;

Dang, Chuangyin .

KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) :785-795

[7] An evolutionary technique based on K-Means algorithm for optimal clustering in RN [J].

Bandyopadhyay, S ;

Maulik, U .

INFORMATION SCIENCES, 2002, 146 (1-4) :221-237

[8] Genetic clustering for automatic evolution of clusters and application to image classification [J].

Bandyopadhyay, S ;

Maulik, U .

PATTERN RECOGNITION, 2002, 35 (06) :1197-1208

[9] Nonparametric genetic clustering: Comparison of validity indices [J].

Bandyopadhyay, S ;

Maulik, U .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (01) :120-125

[10] Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values [J].

Bhattacharya, Anindya ;

De, Rajat K. .

JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (04) :560-568

← 1 2 3 4 5 →