How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification

被引:176
作者
Hennig, Christian [1 ]
Liao, Tim F. [2 ]
机构
[1] UCL, London WC1E 6BT, England
[2] Univ Illinois, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
Average silhouette width; Cluster philosophy; Dissimilarity measure; Interpretation of clustering; k-medoids clustering; Latent class clustering; Mixture model; Number of clusters; Social stratification; PRINCIPAL COMPONENT ANALYSIS; LATENT CLASS ANALYSIS; MIXTURE MODEL; SOCIAL-CLASS; NET WORTH; NUMBER; DISTRIBUTIONS; REGRESSION; CRITERION; SELECTION;
D O I
10.1111/j.1467-9876.2012.01066.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
. Data with mixed-type (metricordinalnominal) variables are typical for social stratification, i.e. partitioning a population into social classes. Approaches to cluster such data are compared, namely a latent class mixture model assuming local independence and dissimilarity-based methods such as k-medoids. The design of an appropriate dissimilarity measure and the estimation of the number of clusters are discussed as well, comparing the Bayesian information criterion with dissimilarity-based criteria. The comparison is based on a philosophy of cluster analysis that connects the problem of a choice of a suitable clustering method closely to the application by considering direct interpretations of the implications of the methodology. The application of this philosophy to economic data from the 2007 US Survey of Consumer Finances demonstrates techniques and decisions required to obtain an interpretable clustering. The clustering is shown to be significantly more structured than a suitable null model. One result is that the data-based strata are not as strongly connected to occupation categories as is often assumed in the literature.
引用
收藏
页码:309 / 369
页数:61
相关论文
共 145 条
[1]   SOCIOECONOMIC-STATUS AND HEALTH - THE CHALLENGE OF THE GRADIENT [J].
ADLER, NE ;
BOYCE, T ;
CHESNEY, MA ;
COHEN, S ;
FOLKMAN, S ;
KAHN, RL ;
SYME, SL .
AMERICAN PSYCHOLOGIST, 1994, 49 (01) :15-24
[2]   QUASI-SYMMETRICAL LATENT CLASS MODELS, WITH APPLICATION TO RATER AGREEMENT [J].
AGRESTI, A ;
LANG, JB .
BIOMETRICS, 1993, 49 (01) :131-139
[3]  
Agresti A, 2013, Categorical data analysis, V3rd
[4]   STATISTICAL MODELING OF DATA ON TEACHING STYLES [J].
AITKIN, M ;
ANDERSON, D ;
HINDE, J .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1981, 144 :419-461
[5]  
[Anonymous], WORKING PAPER
[6]  
[Anonymous], 2009, J STAT SOFTW
[7]  
[Anonymous], SPSS 2 STEPA 1 EVALU
[8]  
[Anonymous], 2000, Sankhya Ser. A, DOI DOI 10.2307/25051289
[9]  
[Anonymous], 2011, R: A Language and Environment for Statistical Computing
[10]  
[Anonymous], INT C MACH LEARN ED