Clustering objects on subsets of attributes

被引:233
作者
Friedman, JH [1 ]
Meulman, JJ
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Leiden Univ, NL-2300 RA Leiden, Netherlands
关键词
bioinformatics; clustering on variable subsets; distance-based clustering; feature selection; gene expression microarray data; genomics; inverse exponential distance; mixtures of numeric and categorical variables; targeted clustering;
D O I
10.1111/j.1467-9868.2004.02059.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A new procedure is proposed for clustering attribute value data. When used in conjunction with conventional distance-based clustering algorithms this procedure encourages those algorithms to detect automatically subgroups of objects that preferentially cluster on subsets of the attribute variables rather than on all of them simultaneously. The relevant attribute subsets for each individual cluster can be different and partially (or completely) overlap with those of other clusters. Enhancements for increasing sensitivity for detecting especially low cardinality groups clustering on a small subset of variables are discussed. Applications in different domains, including gene expression arrays, are presented.
引用
收藏
页码:815 / 839
页数:25
相关论文
共 28 条
[1]   Systematic management and analysis of yeast gene expression data [J].
Aach, J ;
Rindone, W ;
Church, GM .
GENOME RESEARCH, 2000, 10 (04) :431-445
[2]  
[Anonymous], 1996, Clustering and Classification
[3]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[4]   A variable-selection heuristic for K-means clustering [J].
Brusco, MJ ;
Cradit, JD .
PSYCHOMETRIKA, 2001, 66 (02) :249-270
[5]  
Cheeseman P.C., 1996, ADV KNOWLEDGE DISCOV, V180, P153, DOI https://doi.org/10.5555/257938.257954
[6]   SYNTHESIZED CLUSTERING - A METHOD FOR AMALGAMATING ALTERNATIVE CLUSTERING BASES WITH DIFFERENTIAL WEIGHTING OF VARIABLES [J].
DESARBO, WS ;
CARROLL, JD ;
CLARK, LA ;
GREEN, PE .
PSYCHOMETRIKA, 1984, 49 (01) :57-78
[7]  
DESOETE G, 1985, J CLASSIF, V2, P173
[8]   OPTIMAL VARIABLE WEIGHTING FOR ULTRAMETRIC AND ADDITIVE TREE CLUSTERING [J].
DESOETE, G .
QUALITY & QUANTITY, 1986, 20 (2-3) :169-180
[10]   VARIABLE SELECTION IN CLUSTERING [J].
FOWLKES, EB ;
GNANADESIKAN, R ;
KETTENRING, JR .
JOURNAL OF CLASSIFICATION, 1988, 5 (02) :205-228