Mixture separation for mixed-mode data

被引:36
作者
Lawrence, CJ [1 ]
Krzanowski, WJ [1 ]
机构
[1] UNIV EXETER,DEPT MATH STAT & OPERAT RES,EXETER EX4 4QE,DEVON,ENGLAND
关键词
cluster analysis; conditional Gaussian distribution; EM algorithm; graphical modelling; location model; mixture maximum likelihood; simulation;
D O I
10.1007/BF00161577
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.
引用
收藏
页码:85 / 92
页数:8
相关论文
共 21 条
[1]  
[Anonymous], 1990, J APPL STAT, DOI DOI 10.1080/02664769000000001
[2]   MULTI-VARIATE PROBIT ANALYSIS [J].
ASHFORD, JR ;
SOWDEN, RR .
BIOMETRICS, 1970, 26 (03) :535-&
[3]   REVIEW OF CLASSIFICATION [J].
CORMACK, RM .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-GENERAL, 1971, 134 :321-+
[4]  
COX DR, 1992, BIOMETRIKA, V79, P441, DOI 10.1093/biomet/79.3.441
[5]  
DAY NE, 1969, BIOMETRIKA, V56, P463, DOI 10.1093/biomet/56.3.463
[6]   ANALYZING MULTIVARIATE FLOW CYTOMETRIC DATA IN AQUATIC SCIENCES [J].
DEMERS, S ;
KIM, J ;
LEGENDRE, P ;
LEGENDRE, L .
CYTOMETRY, 1992, 13 (03) :291-298
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
EDWARDS D, 1990, J ROY STAT SOC B MET, V52, P3
[9]  
Everitt B., 1993, CLUSTER ANAL
[10]   A FINITE MIXTURE MODEL FOR THE CLUSTERING OF MIXED-MODE DATA [J].
EVERITT, BS .
STATISTICS & PROBABILITY LETTERS, 1988, 6 (05) :305-309