Unsupervised learning with mixed numeric and nominal data

被引:127
作者
Li, C [1 ]
Biswas, G
机构
[1] Middle Tennessee State Univ, Dept Comp Sci, Murfreesboro, TN 37132 USA
[2] Vanderbilt Univ, Dept Elect Engn, Nashville, TN 37235 USA
关键词
agglomerative clustering; conceptual clustering; feature weighting; interpretation; knowledge discovery; mixed numeric and nominal data; similarity measures; chi(2) aggregation;
D O I
10.1109/TKDE.2002.1019208
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a Similarity-Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. Asimilarity measure, proposed by Goodall for biological taxonomy [15], that gives greater weight to uncommon feature value matches in similarity computations and makes no assumptions of the underlying distributions of the feature values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a dendrogram and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on real and artificially generated data sets. Results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other clustering schemes illustrate the superior performance of this approach.
引用
收藏
页码:673 / 690
页数:18
相关论文
共 31 条
[1]  
[Anonymous], P 7 ANN C COGN SCI S
[2]   A CONCEPTUAL CLUSTERING-ALGORITHM FOR DATABASE SCHEMA DESIGN [J].
BECK, HW ;
ANWAR, T ;
NAVATHE, SB .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1994, 6 (03) :396-411
[3]   ITERATE: A conceptual clustering algorithm for data mining [J].
Biswas, G ;
Weinberg, JB ;
Fisher, DH .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 1998, 28 (02) :219-230
[4]  
Biswas G., 1995, ARTIF INTELL, P111
[5]  
CHEESMAN P, 1998, P 5 INT C MACH LEARN
[6]  
Cheesman P., 1995, ADV KNOWLEDGE DISCOV
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   Iterative optimization and simplification of hierarchical clusterings [J].
Fisher, D .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :147-179
[9]  
Fisher D. H., 1987, Machine Learning, V2, P139, DOI 10.1007/BF00114265
[10]  
FISHER DH, 1986, P ART INT STAT