Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

被引:29
作者
Bertoni, Alberto [1 ]
Valentini, Giorgio [1 ]
机构
[1] Univ Milan, Dipartimento Sci Informaz, DSI, Milan, Italy
关键词
gene expression data clustering; assessment of cluster stability; cluster reliability; random subspace; random projections; DNA microarrays;
D O I
10.1016/j.artmed.2006.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Objective: Clustering algorithms may be applied to the analysis of DNA microarray data to identify novel subgroups that may lead to new taxonomies of diseases defined at bio-molecular level. A major problem related to the identification of biologically meaningful clusters is the assessment of their reliability, since clustering algorithms may find clusters even if no structure is present. Methodology: Recently, methods based on random "perturbations" of the data, such as bootstrapping, noise injections techniques and random subspace methods have been applied to the problem of cluster validity estimation. In this framework, we propose stability measures that exploits the high dimensionality of DNA microarray data and the redundancy of information stored in microarray chips. To this end we randomly project the original gene expression data into tower dimensional subspaces, approximately preserving the distance between the examples according to the Johnson-Lindenstrauss (JL) theory. The stability of the clusters discovered in the original high dimensional space is estimated by comparing them with the clusters discovered in randomly projected lower dimensional subspaces. The proposed cluster-stability measures may be applied to validate and to quantitatively assess the reliability of the clusters obtained by a large class of clustering algorithms. Results and conclusion: We tested the effectiveness of our approach with high dimensional synthetic data, whose distribution is a priori known, showing that the stability measures based on randomized maps correctly predict the number of clusters and the reliability of each individual cluster. Then we showed how to apply the proposed measures to the analysis of DNA microarray data, whose underlying distribution is unknown. We evaluated the validity of clusters discovered by hierarchical clustering algorithms in diffuse large B-cell lymphoma (DLBCL) and malignant melanoma patients, showing that the proposed reliability measures can support bio-medical researchers in the identification of stable clusters of patients and in the discovery of new subtypes of diseases characterized at bio-molecular level. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:85 / 109
页数:25
相关论文
共 58 条
[1]
Achlioptas D., 2001, P 20 ACM SIGMOD SIGA, P274, DOI DOI 10.1145/375551.375608
[2]
Towards a novel classification of human malignancies based on gene expression patterns [J].
Alizadeh, AA ;
Ross, DT ;
Perou, CM ;
van de Rijn, M .
JOURNAL OF PATHOLOGY, 2001, 195 (01) :41-52
[3]
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[4]
[Anonymous], CURR TOP COMPUT BIOL
[5]
Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[6]
Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
[7]
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[8]
BINGHAM E, 2001, P KNOWL DISC DAT MIN
[9]
Molecular classification of cutaneous malignant melanoma by gene expression profiling [J].
Bittner, M ;
Meitzer, P ;
Chen, Y ;
Jiang, Y ;
Seftor, E ;
Hendrix, M ;
Radmacher, M ;
Simon, R ;
Yakhini, Z ;
Ben-Dor, A ;
Sampas, N ;
Dougherty, E ;
Wang, E ;
Marincola, F ;
Gooden, C ;
Lueders, J ;
Glatfelter, A ;
Pollock, P ;
Carpten, J ;
Gillanders, E ;
Leja, D ;
Dietrich, K ;
Beaudry, C ;
Berens, M ;
Alberts, D ;
Sondak, V ;
Hayward, N ;
Trent, J .
NATURE, 2000, 406 (6795) :536-540
[10]
BRAFFORD P, 2005, J TRANSL MED, P3