Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data

被引:1412
作者
Monti, S [1 ]
Tamayo, P [1 ]
Mesirov, J [1 ]
Golub, T [1 ]
机构
[1] MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA
关键词
unsupervised learning; class discovery; model selection; gene expression microarrays;
D O I
10.1023/A:1023949509487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart ( such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.
引用
收藏
页码:91 / 118
页数:28
相关论文
共 39 条
[1]  
[Anonymous], CANC CELL
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]  
BARJOSEPH Z, 2002, IN PRESS BIOINFORMAT
[4]  
Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
[5]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[6]   ON SOME SIGNIFICANCE TESTS IN CLUSTER-ANALYSIS [J].
BOCK, HH .
JOURNAL OF CLASSIFICATION, 1985, 2 (01) :77-108
[7]  
Cheeseman P.C., 1996, ADV KNOWLEDGE DISCOV, V180, P153, DOI https://doi.org/10.5555/257938.257954
[8]   Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables [J].
Chickering, DM ;
Heckerman, D .
MACHINE LEARNING, 1997, 29 (2-3) :181-212
[9]  
Cowell F.A., 2011, LSE Perspectives in Economic Analysis, Vthird, DOI [10.1093/acprof:osobl/9780199594030.001.0001, DOI 10.1093/ACPROF:OSOBL/9780199594030.001.0001]
[10]  
Duda R. O., 2000, Pattern Classification and Scene Analysis, V2nd