Hierarchical clustering of mixed data based on distance hierarchy

被引:77
作者
Hsu, Chung-Chian [1 ]
Chen, Chin-Long [1 ]
Su, Yu-Wei [1 ]
机构
[1] Natl Yunlin Univ Sci & Technol, Dept Informat Management, Touliu 640, Yunlin, Taiwan
关键词
categorical data; distance hierarchy; hierarchical clustering; k-means; mixed data;
D O I
10.1016/j.ins.2007.05.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data clustering is an important data mining technique which partitions data according to some similarity criterion. Abundant algorithms have been proposed for clustering numerical data and some recent research tackles the problem of clustering categorical or mixed data. Unlike the subtraction scheme used for numerical attributes, there is no standard for measuring distance between categorical values. In this article, we propose a distance representation scheme, distance hierarchy, which facilitates expressing the similarity between categorical values and also unifies distance measuring of numerical and categorical values. We then apply the scheme to mixed data clustering, in particular, to integrate with a hierarchical clustering algorithm. Consequently, this integrated approach can uniformly handle numerical data and categorical data, and also enables one to take the similarity between categorical values into consideration. Experimental results show that the proposed approach produces better clustering results than conventional clustering algorithms when categorical attributes are present and their values have different degree of similarity. (c) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:4474 / 4492
页数:19
相关论文
共 38 条