Model-based clustering on the unit sphere with an illustration using gene expression profiles

被引:26
作者
Dortet-Bernadet, Jean-Luc [1 ]
Wicker, Nicolas [2 ]
机构
[1] Univ Strasbourg 1, CNRS, UMR 7501, Inst Rech Math Avancee, Strasbourg, France
[2] Univ Strasbourg 1, Inst Genet & Biol Mol & Cellulaire, Lab Bioinformat & Genom Integratives, Strasbourg, France
关键词
clustering; directional data; Microarrays; mixture;
D O I
10.1093/biostatistics/kxm012
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider model-based clustering of data that lie on a unit sphere. Such data arise in the analysis of microarray experiments when the gene expressions are standardized so that they have mean 0 and variance 1 across the arrays. We propose to model the clusters on the sphere with inverse stereographic projections of multivariate normal distributions. The corresponding model-based clustering algorithm is described. This algorithm is applied first to simulated data sets to assess the performance of several criteria for determining the number of clusters and to compare its performance with existing methods and second to a real reference data set of standardized gene expression profiles.
引用
收藏
页码:66 / 80
页数:15
相关论文
共 27 条
[1]  
Akaike H., 1973, 2 INT S INFORM THEOR, P267, DOI [DOI 10.1007/978-1-4612-1694-0_15, 10.1007/978-1-4612-1694-0_15]
[2]  
[Anonymous], 1987, Statistical Analysis of Spherical Data
[3]  
[Anonymous], 1958, INTRO MULTIVARIATE S
[4]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1345
[5]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[6]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[7]  
Celeux G., 1993, J STAT COMPUT SIM, V47, P127, DOI DOI 10.1080/00949659308811525
[8]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[9]  
DEMPSTER AP, 1977, J R STAT SOC B, V39, P249
[10]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175