A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data

被引:51
作者
Teschendorff, AE [1 ]
Wang, YZ [1 ]
Barbosa-Morais, NL [1 ]
Brenton, JD [1 ]
Caldas, C [1 ]
机构
[1] Univ Cambridge, Hutchinson MRC Res Ctr, Canc Genom Program, Dept Oncol, Cambridge CB2 2XZ, England
关键词
D O I
10.1093/bioinformatics/bti466
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. Results: We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular alternative based on the Bayesian information criterion. Using simulated data, we show that the variational Bayesian method is more accurate in finding the true number of clusters in situations that are relevant to current and future microarray studies. We also compare the two criteria using freely available tumour microarray datasets and show that the variational Bayesian method is more sensitive to capturing biologically relevant structure.
引用
收藏
页码:3025 / 3033
页数:9
相关论文
共 33 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
[Anonymous], P 3 ANN S NEUR NETW
[3]  
Attias H, 1999, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, P21
[4]  
Beal MJ, 2003, BAYESIAN STATISTICS 7, P453
[5]  
Calinski T., 1974, COMMUN STAT, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[6]  
Dasgupta S., 1999, P 40 ANN S FDN COMP, P634, DOI DOI 10.1109/SFFCS.1999.814639
[7]  
Dudoit S, 2002, GENOME BIOL, V3
[8]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[9]   MCLUST: Software for model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF CLASSIFICATION, 1999, 16 (02) :297-306
[10]   Mixture modelling of gene expression data from microarray experiments [J].
Ghosh, D ;
Chinnaiyan, AM .
BIOINFORMATICS, 2002, 18 (02) :275-286