A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized dirichlet mixture

被引:76
作者
Bouguila, Nizar [1 ]
Ziou, Djemel
机构
[1] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
[2] Univ Sherbrooke, Dept Informat, Sherbrooke, PQ J1K 2R1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
clustering; correlogram; expectation maximization (EM); finite mixture models; generalized Dirichlet; high-dimensional data; hybrid stochastic expectation maximization algorithm (HSEM); image database summarization; image object recognition; image restoration; maximum likelihood (ML); SEM; Vistex;
D O I
10.1109/TIP.2006.877379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper applies a robust statistical scheme to the problem of unsupervised learning of high-dimensional data. We develop, analyze, and apply a new finite mixture model based on a generalization of the Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. We show that the mathematical properties of this distribution allow high-dimensional modeling without requiring dimensionality reduction and, thus, without a loss of information. This makes the generalized Dirichlet distribution more practical and useful. We propose a hybrid stochastic expectation maximization algorithm (HSEM) to estimate the parameters of the generalized Dirichlet mixture. The algorithm is called stochastic because it contains a step in which the data elements are assigned randomly to components in order to avoid convergence to a saddle point. The adjective "hybrid" is justified by the introduction of a Newton-Raphson step. Moreover, the HSEM algorithm autonomously selects the number of components by the introduction of an agglomerative term. The performance of our method is tested by the classification of several pattern-recognition data sets. The generalized Dirichlet mixture is also applied to the problems of image restoration, image object recognition and texture image database summarization for efficient retrieval. For the texture image summarization problem, results are reported for the Vistex texture image database from the MIT Media Lab.
引用
收藏
页码:2657 / 2668
页数:12
相关论文
共 39 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 1985, Computational Statistics Quarterly, DOI DOI 10.1155/2010/874592
[3]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[4]  
Blake C., 1998, REPOSITORY MACHINE L
[5]   A powreful finite mixture model based on the generalized Dirichlet distribution: Unsupervised learning and applications [J].
Bouguila, N ;
Ziou, D .
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, :280-283
[6]  
Bouguila N, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS, P521
[7]   Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application [J].
Bouguila, N ;
Ziou, D ;
Vaillancourt, J .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2004, 13 (11) :1533-1543
[8]  
Bouguila N, 2003, LECT NOTES ARTIF INT, V2734, P172
[9]  
Bouguila N., 2004, PRIS2004, P118
[10]  
Carson C., 1999, 3 INT C VIS INF SYST