Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering

被引:64
作者
de Brevern, AG
Hazout, S
Malpertuy, A
机构
[1] Univ Paris 07, INSERM, E0346, EBGM, F-75251 Paris 05, France
[2] Atragene Bioinformat, F-91000 Evry, France
关键词
D O I
10.1186/1471-2105-5-114
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated, or replaced by zero or estimated by the k-Nearest Neighbor (kNN) approach. The topic of the paper is to study the stability of gene clusters, defined by various hierarchical clustering algorithms, of microarrays experiments including or not MVs. Results: In this study, we show that the MVs have important effects on the stability of the gene clusters. Moreover, the magnitude of the gene misallocations is depending on the aggregation algorithm. The most appropriate aggregation methods ( e. g. complete-linkage and Ward) are highly sensitive to MVs, and surprisingly, for a very tiny proportion of MVs ( e. g. 1%). In most of the case, the MVs must be replaced by expected values. The MVs replacement by the kNN approach clearly improves the identification of co-expressed gene clusters. Nevertheless, we observe that kNN approach is less suitable for the extreme values of gene expression. Conclusion: The presence of MVs ( even at a low rate) is a major factor of gene cluster instability. In addition, the impact depends on the hierarchical clustering algorithm used. Some methods should be used carefully. Nevertheless, the kNN approach constitutes one efficient method for restoring the missing expression gene values, with a low error level. Our study highlights the need of statistical treatments in microarray data to avoid misinterpretation.
引用
收藏
页数:12
相关论文
共 37 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
[Anonymous], 1979, Multivariate analysis
[3]   Variation in gene expression patterns in follicular lymphoma and the response to rituximab [J].
Bohen, SP ;
Troyanskaya, OG ;
Alter, O ;
Warnke, R ;
Botstein, D ;
Brown, PO ;
Levy, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (04) :1926-1930
[4]   Large-scale identification of single-feature polymorphisms in complex genomes [J].
Borevitz, JO ;
Liang, D ;
Plouffe, D ;
Chang, HS ;
Zhu, T ;
Weigel, D ;
Berry, CC ;
Winzeler, E ;
Chory, J .
GENOME RESEARCH, 2003, 13 (03) :513-523
[5]   Genome-wide mapping with biallelic markers in Arabidopsis thaliana [J].
Cho, RJ ;
Mindrinos, M ;
Richards, DR ;
Sapolsky, RJ ;
Anderson, M ;
Drenkard, E ;
Dewdney, L ;
Reuber, TL ;
Stammers, M ;
Federspiel, N ;
Theologis, A ;
Yang, WH ;
Hubbell, E ;
Au, M ;
Chung, EY ;
Lashkari, D ;
Lemieux, B ;
Dean, C ;
Lipshutz, RJ ;
Ausubel, FM ;
Davis, RW ;
Oefner, PJ .
NATURE GENETICS, 1999, 23 (02) :203-207
[6]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686
[7]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[8]  
Everitt B, 1974, CLUSTER ANAL
[9]   Systematic changes in gene expression patterns following adaptive evolution in yeast [J].
Ferea, TL ;
Botstein, D ;
Brown, PO ;
Rosenzweig, RF .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (17) :9721-9726
[10]   Diversity of gene expression in adenocarcinoma of the lung [J].
Garber, ME ;
Troyanskaya, OG ;
Schluens, K ;
Petersen, S ;
Thaesler, Z ;
Pacyna-Gengelbach, M ;
van de Rijn, M ;
Rosen, GD ;
Perou, CM ;
Whyte, RI ;
Altman, RB ;
Brown, PO ;
Botstein, D ;
Petersen, I .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13784-13789