Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments

被引:67
作者
Celeux, G
Martin, O
Lavergne, C
机构
[1] INRA, Unite Proteom, F-34060 Montpellier, France
[2] Univ Paris Sud, Dept Math, Paris, France
[3] Inst Math & Modelisat Montpellier, Montpellier, France
关键词
cluster analysis; gene expression profile; linear model; mixture model; penalized likelihood criteria; random effect;
D O I
10.1191/1471082X05st096oa
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Data variability can be important in microarray data analysis. Thus, when clustering gene expression profiles, it could be judicious to make use of repeated data. In this paper, the problem of analysing repeated data in the model-based cluster analysis context is considered. Linear mixed models are chosen to take into account data variability and mixture of these models are considered. This leads to a large range of possible models depending on the assumptions made on both the covariance structure of the observations and the mixture model. The maximum likelihood estimation of this family of models through the EM algorithm is presented. The problem of selecting a particular mixture of linear mixed models is considered using penalized likelihood criteria. Illustrative Monte Carlo experiments are presented and an application to the clustering of gene expression profiles is detailed. All those experiments highlight the interest of linear mixed model mixtures to take into account data variability in a cluster analysis context.
引用
收藏
页码:243 / 267
页数:25
相关论文
共 29 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2001, Journal of the European Mathematical Society, DOI DOI 10.1007/S100970100031
[3]   Optimization of mixture models: Comparison of different strategies [J].
Berchtold, A .
COMPUTATIONAL STATISTICS, 2004, 19 (03) :385-406
[4]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[5]   Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) :561-575
[6]  
CELEUX G, 2002, MIXTURE LINEAR MIXED
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   Expression profiling using cDNA microarrays [J].
Duggan, DJ ;
Bittner, M ;
Chen, YD ;
Meltzer, P ;
Trent, JM .
NATURE GENETICS, 1999, 21 (Suppl 1) :10-14
[9]  
Efron B., 2000, MICROARRAYS THEIR US
[10]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868