Bayesian variable selection in clustering high-dimensional data

被引:152
作者
Tadesse, MG [1 ]
Sha, N
Vannucci, M
机构
[1] Univ Penn, Dept Biostat, Philadelphia, PA 19104 USA
[2] Univ Texas, Dept Math Sci, El Paso, TX 79968 USA
[3] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
Bayesian variable selection; Bayesian clustering; label switching; reversible-jump Markov chain Monte Carlo;
D O I
10.1198/016214504000001565
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Over the last decade, technological advances have generated an explosion of data with substantially smaller sample size relative to the number of covariates (p >> n). A common goal in the analysis of such data involves uncovering the group structure of the observations and identifying the discriminating variables. In this article we propose a methodology for addressing these problems simultaneously. Given a set of variables, we formulate the clustering problem in terms of a multivariate normal mixture model with an unknown number of components and use the reversible-jump Markov chain Monte Carlo technique to define a sampler that moves between different dimensional spaces. We handle the problem of selecting a few predictors among the prohibitively vast number of variable subsets by introducing a binary exclusion/inclusion latent vector, which gets updated via stochastic search techniques. We specify conjugate priors and exploit the conjugacy by integrating out some of the parameters. We describe strategies for posterior inference and explore the performance of the methodology with simulated and real datasets.
引用
收藏
页码:602 / 617
页数:16
相关论文
共 26 条
  • [1] Anderson E., 1935, Bulletin of the American IRIS Society, V59, P2
  • [2] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING
    BANFIELD, JD
    RAFTERY, AE
    [J]. BIOMETRICS, 1993, 49 (03) : 803 - 821
  • [3] Brown P. J., 1993, MEASUREMENT REGRESSI
  • [4] Multivariate Bayesian variable selection and prediction
    Brown, PJ
    Vannucci, M
    Fearn, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 : 627 - 641
  • [5] Brown PJ, 1998, J CHEMOMETR, V12, P173, DOI 10.1002/(SICI)1099-128X(199805/06)12:3<173::AID-CEM505>3.0.CO
  • [6] 2-0
  • [7] A variable-selection heuristic for K-means clustering
    Brusco, MJ
    Cradit, JD
    [J]. PSYCHOMETRIKA, 2001, 66 (02) : 249 - 270
  • [8] CHANG WC, 1983, APPL STAT-J ROY ST C, V32, P267
  • [9] DIEBOLT J, 1994, J ROY STAT SOC B MET, V56, P363
  • [10] VARIABLE SELECTION IN CLUSTERING
    FOWLKES, EB
    GNANADESIKAN, R
    KETTENRING, JR
    [J]. JOURNAL OF CLASSIFICATION, 1988, 5 (02) : 205 - 228