CLUSTERING WITHOUT A METRIC

被引:18
作者
MATTHEWS, G
HEARNE, J
机构
[1] Department of Computer Science, Western Washington University, Bellingham, WA
关键词
CLUSTERING; CLUSTER VALIDITY; MULTIVARIATE DATA; PROXIMITY INDEXES; UNSUPERVISED LEARNING;
D O I
10.1109/34.67646
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a methodology for clustering data in which a distance metric or similarity function is not used. Instead, clusterings are optimized based on their intended function: the accurate prediction of properties of the data. The resulting clustering methodology is applicable, without further ad hoc assumptions or transformations of the data 1) when features are heterogeneous (both discrete and continuous) and not combinable, 2) where some data points have missing feature values, and 3) where some features are irrelevant, i.e., have large variance but little correlation with other features. Further, it provides an integral measure of the quality of the resulting clustering. We have implemented a clustering program, RIFFLE, in line with this approach, and experiments with synthetic and real data show that the clustering is, in many respects, superior to traditional methods.
引用
收藏
页码:175 / 184
页数:10
相关论文
共 20 条
[1]  
[Anonymous], 1988, ALGORITHMS CLUSTERIN
[2]   MONTE-CARLO COMPARISONS OF SELECTED CLUSTERING PROCEDURES [J].
BAYNE, CK ;
BEAUCHAMP, JJ ;
BEGOVICH, CL ;
KANE, VE .
PATTERN RECOGNITION, 1980, 12 (02) :51-62
[3]  
Breiman L, 2017, CLASSIFICATION REGRE, P368, DOI 10.1201/9781315139470
[4]  
CHEESEMAN P, 1988, 5TH P INT C MACH LEA
[5]   HOW MANY CLUSTERS ARE BEST - AN EXPERIMENT [J].
DUBES, RC .
PATTERN RECOGNITION, 1987, 20 (06) :645-663
[6]  
EHINGER WJ, 1988, THESIS W WASHINGTON
[7]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[8]   UNSUPERVISED OPTIMAL FUZZY CLUSTERING [J].
GATH, I ;
GEVA, AB .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (07) :773-781
[9]   MEASURES OF ASSOCIATION FOR CROSS CLASSIFICATIONS .4. SIMPLIFICATION OF ASYMPTOTIC VARIANCES [J].
GOODMAN, LA ;
KRUSKAL, WH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1972, 67 (33) :415-&
[10]   MEASURES OF ASSOCIATION FOR CROSS CLASSIFICATIONS .2. FURTHER DISCUSSION AND REFERENCES [J].
GOODMAN, LA ;
KRUSKAL, WH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1959, 54 (285) :123-163