K-modes clustering

被引：181

作者：

Chaturvedi, A

Green, PE

Carroll, JD

机构：

[1] Kraft Gen Foods Inc, Glenview, IL 60025 USA

[2] Univ Penn, Wharton Sch, Philadelphia, PA 19104 USA

[3] Rutgers State Univ, Grad Sch Management, Newark, NJ 07102 USA

来源：

JOURNAL OF CLASSIFICATION | 2001年 / 18卷 / 01期

关键词：

categorical data; cluster analysis; groups; modes; latent class analysis;

D O I：

10.1007/s00357-001-0004-3

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

We present a nonparametric approach to deriving clusters from categorical (nominal scale) data using a new clustering procedure called K-modes, which is analogous to the traditional K-Means procedure (MacQueen 1967) for clustering interval scale data. Unlike most existing methods for clustering nominal scale data, the K-modes procedure explicitly optimizes a loss function based on the Lo norm (defined as the limit of an L-p norm as p approaches zero). In Monte Carlo simulations, both K-modes and latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underlying cluster structure. However, K-modes is an order of magnitude faster than the latent class procedure in speed and suffers from fewer problems of local optima than do latent class procedures. For data sets involving a large number of categorical variables, latent class procedures become computationally extremely slow and hence infeasible. We conjecture that, although in some cases latent class procedures might perform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "complementary" procedures in performing cluster analysis. We also present an empirical comparison of K-modes and latent class, where the former method prevails.

引用

页码：35 / 55

页数：21

共 16 条

[11] COMPARING PARTITIONS
HUBERT, L
ARABIE, P
[J]. JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) : 193 - 218
[12] MacQueen J, 1965, P 5 BERK S MATH STAT, P281
[13] AN INDEX OF GOODNESS-OF-FIT BASED ON NONCENTRALITY
MCDONALD, RP
[J]. JOURNAL OF CLASSIFICATION, 1989, 6 (01) : 97 - 103
[14] A SEQUENTIAL FITTING PROCEDURE FOR LINEAR DATA-ANALYSIS MODELS
MIRKIN, BG
[J]. JOURNAL OF CLASSIFICATION, 1990, 7 (02) : 167 - 195
[15] Joint segmentation on distinct interdependent bases with categorical data
Ramaswamy, V
Chatterjee, R
Cohen, SH
[J]. JOURNAL OF MARKETING RESEARCH, 1996, 33 (03) : 337 - 350
[16] ESTIMATING DIMENSION OF A MODEL
SCHWARZ, G
[J]. ANNALS OF STATISTICS, 1978, 6 (02) : 461 - 464

← 1 2 →