Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy

被引:12
作者
Corander J. [1 ]
Gyllenberg M. [2 ]
Koski T. [3 ]
机构
[1] Department of Mathematics, Åbo Akademi University
[2] Department of Mathematics and Statistics, Rolf Nevanlinna Institute, University of Helsinki, Helsinki 00014
[3] Department of Mathematics, Royal Institute of Technology
来源
Adv. Data Anal. Classif. | 2009年 / 1卷 / 3-24期
基金
芬兰科学院;
关键词
Bayesian classification; Markov chain Monte Carlo; Statistical learning; Stochastic optimization;
D O I
10.1007/s11634-009-0036-9
中图分类号
学科分类号
摘要
Advantages of statistical model-based unsupervised classification over heuristic alternatives have been widely demonstrated in the scientific literature. However, the existing model-based approaches are often both conceptually and numerically instable for large and complex data sets. Here we consider a Bayesian model-based method for unsupervised classification of discrete valued vectors, that has certain advantages over standard solutions based on latent class models. Our theoretical formulation defines a posterior probability measure on the space of classification solutions corresponding to stochastic partitions of observed data. To efficiently explore the classification space we use a parallel search strategy based on non-reversible stochastic processes. A decision-theoretic approach is utilized to formalize the inferential process in the context of unsupervised classification. Both real and simulated data sets are used for the illustration of the discussed methods. © 2009 Springer-Verlag.
引用
收藏
页码:3 / 24
页数:21
相关论文
共 34 条
[1]  
Aarts E.H.L., Korst J., Simulated Annealing and Boltzmann Machines, (1989)
[2]  
Bernardo J.M., Smith A.F.M., Bayesian Theory, (1994)
[3]  
Bock H.-H., Probabilistic models in cluster analysis, Comput Stat Data Anal, 23, pp. 5-28, (1996)
[4]  
Cerquides J., De Mantaras R.L., TAN classifiers based on decomposable distributions, Machine Learning, 59, 3, pp. 323-354, (2005)
[5]  
Corander J., Tang J., Bayesian analysis of population structure based on linked molecular information, Mathematical Biosciences, 205, 1, pp. 19-31, (2007)
[6]  
Corander J., Ekdahl M., Koski T., Parallell interacting MCMC for learning of topologies of graphical models, Data Mining Knowl Discovery, (2008)
[7]  
Corander J., Gyllenberg M., Koski T., Bayesian model learning based on a parallel MCMC strategy, Statistics and Computing, 16, 4, pp. 355-362, (2006)
[8]  
Corander J., Gyllenberg M., Koski T., Random partition models and exchangeability for bayesian identification of population structure, Bulletin of Mathematical Biology, 69, 3, pp. 797-815, (2007)
[9]  
Corander J., Waldmann P., Marttinen P., Sillanpaa M.J., BAPS 2: Enhanced possibilities for the analysis of genetic population structure, Bioinformatics, 20, 15, pp. 2363-2369, (2004)
[10]  
Dawson K.J., Belkhir K., A Bayesian approach to the identification of panmictic populations and the assignment of individuals, Genetical Research, 78, 1, pp. 59-77, (2001)