A MULTISEED NON-HIERARCHICAL CLUSTERING TECHNIQUE FOR DATA-ANALYSIS

被引:6
作者
CHAUDHURI, D
CHAUDHURI, BB
机构
[1] Electronics and Communication Sciences Unit, Indian Statistical Institute, Calcutta, 700 035
关键词
D O I
10.1080/00207729508929040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
Clustering methods such as K-means and its variations, such as Forgy, as well as their improved version ISODATA, do not work well if the shape of the cluster is elongated. It is pointed out that a single seed point cannot correctly reflect the nature of the data of an elongated cluster. A multiseed clustering algorithm is proposed, where one cluster may contain more than one seed point. A density-based algorithm is used to choose the initial seed points. To assign several seed points to one cluster, a minimal spanning tree guided novel merging technique is proposed. The merging technique is quite general and may be applied to other clustering approaches as well. Experimental results are presented to demonstrate the efficiency of this clustering procedure.
引用
收藏
页码:375 / 385
页数:11
相关论文
共 12 条
[1]
ANDERBERG MR, 1973, CLUSTER ANAL APPLICA
[2]
ASTRAHAN MM, 1970, AD709067
[3]
BALL GH, 1964, AD822174
[4]
A NEW SPLIT-AND-MERGE CLUSTERING TECHNIQUE [J].
CHAUDHURI, D ;
CHAUDHURI, BB ;
MURTHY, CA .
PATTERN RECOGNITION LETTERS, 1992, 13 (06) :399-409
[5]
FINDING A SUBSET OF REPRESENTATIVE POINTS IN A DATA SET [J].
CHAUDHURI, D ;
MURTHY, CA ;
CHAUDHURI, BB .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1994, 24 (09) :1416-1424
[6]
Devijver P., 1982, PATTERN RECOGNITION
[7]
FORGY EW, 1965, BIOMETRICS, V21, P768
[8]
Jain K., 1988, DUBES ALGORITHMS CLU
[9]
REPRESENTING POINTS IN MANY DIMENSIONS BY TREES AND CASTLES [J].
KLEINER, B ;
HARTIGAN, JA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (374) :260-269
[10]
PROBABILITY THEORY OF CLUSTER ANALYSIS [J].
LING, RF .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1973, 68 (341) :159-164