Clustering binary data in the presence of masking variables

被引:31
作者
Brusco, MJ [1 ]
机构
[1] Florida State Univ, Coll Business, Dept Marketing, Tallahassee, FL 32306 USA
关键词
D O I
10.1037/1082-989X.9.4.510
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the true cluster structure. The author presents a heuristic procedure that selects an appropriate subset from among the set of all candidate clustering variables. Specifically, this procedure attempts to select only those variables that contribute to the definition of true cluster structure while eliminating variables that can hide (or mask) that true structure. Experimental testing of the proposed variable-selection procedure reveals that it is extremely successful at accomplishing this goal.
引用
收藏
页码:510 / 523
页数:14
相关论文
共 50 条
[1]   Constrained clustering and Kohonen self-organizing maps [J].
Ambroise, C ;
Govaert, G .
JOURNAL OF CLASSIFICATION, 1996, 13 (02) :299-313
[2]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[3]  
ARABIE P, 1992, ANNU REV PSYCHOL, V43, P169
[4]  
BALAKRISHNAN PV, 1994, PSYCHOMETRIKA, V59, P509
[5]  
Bishop C. M., 1996, Neural networks for pattern recognition
[6]   An enhanced branch-and-bound algorithm for a partitioning problem [J].
Brusco, MJ .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2003, 56 :83-92
[7]   A variable-selection heuristic for K-means clustering [J].
Brusco, MJ ;
Cradit, JD .
PSYCHOMETRIKA, 2001, 66 (02) :249-270
[8]   HlNoV: A new model to improve market segment definition by identifying noisy variables [J].
Carmone, FJ ;
Kara, A ;
Maxwell, S .
JOURNAL OF MARKETING RESEARCH, 1999, 36 (04) :501-509
[9]   K-means clustering methods with influence detection [J].
Cheng, R ;
Milligan, GW .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1996, 56 (05) :833-838
[10]   BINCLUS - NONHIERARCHICAL CLUSTERING OF BINARY DATA [J].
CLIFF, N ;
MCCORMICK, DJ ;
ZATKIN, JL ;
CUDECK, RA ;
COLLINS, LM .
MULTIVARIATE BEHAVIORAL RESEARCH, 1986, 21 (02) :201-227