Improving hierarchical cluster analysis: A new method with outlier detection and automatic clustering

被引:139
作者
Almeida, J. A. S. [1 ]
Barbosa, L. M. S. [1 ]
Pais, A. A. C. C. [1 ]
Formosinho, S. J. [1 ]
机构
[1] Univ Coimbra, Dept Quim, P-3004535 Coimbra, Portugal
关键词
clustering; unsupervised pattern recognition; hierarchical cluster analysis; single linkage; outlier removal;
D O I
10.1016/j.chemolab.2007.01.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Techniques based on agglomerative hierarchical clustering constitute one of the most frequent approaches in unsupervised clustering. Some are based on the single linkage methodology, which has been shown to produce good results with sets of clusters of various sizes and shapes. However, the application of this type of algorithms in a wide variety of fields has posed a number of problems, such as the sensitivity to outliers and fluctuations in the density of data points. Additionally, these algorithms do not usually allow for automatic clustering. In this work we propose a method to improve single linkage hierarchical cluster analysis (HCA), so as to circumvent most of these problems and attain the performance of most sophisticated new approaches. This completely automated method is based on a self-consistent outlier reduction approach, followed by the building-up of a descriptive function. This, in turn, allows to define natural clusters. Finally, the discarded objects may be optionally assigned to these clusters. The validation of the method is carried out by employing widely used data sets available from literature and others for specific purposes created by the authors. Our method is shown to be very efficient in a large variety of situations. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:208 / 217
页数:10
相关论文
共 29 条
[1]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[2]  
[Anonymous], ANAL CHEM MODERN APP
[3]  
[Anonymous], 2006, DATA MINING CONCEPTS
[4]   CLUSTER-ANALYSIS [J].
BRATCHELL, N .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1989, 6 (02) :105-125
[5]  
Brecheisen S, 2004, SIAM PROC S, P400
[6]  
BRERETON RG, 2004, DATA ANAL LAB CHEM P
[7]  
COOMANS D, 1981, ANAL CHIM ACTA-COMP, V5, P225
[8]   Comparison of dissolution profiles of Ibuprofen pellets [J].
Costa, FO ;
Sousa, JJS ;
Pais, AACC ;
Formosinho, SJ .
JOURNAL OF CONTROLLED RELEASE, 2003, 89 (02) :199-212
[9]   Density-based clustering for exploration of analytical data [J].
Daszykowski, M ;
Walczak, B ;
Massart, DL .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2004, 380 (03) :370-372
[10]   Looking for natural patterns in analytical data. 2. Tracing local density with OPTICS [J].
Daszykowski, M ;
Walczak, B ;
Massart, DL .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (03) :500-507