Robust complementary hierarchical clustering for gene expression data analysis by β-divergence

被引:10
作者
Badsha, Md. Bahadur [1 ]
Mollah, Md. Nurul Hague [2 ]
Jahan, Nusrat [3 ]
Kurata, Hiroyuki [1 ,4 ]
机构
[1] Kyushu Inst Technol, Dept Biosci & Bioinformat, Iizuka, Fukuoka 8208502, Japan
[2] Rajshahi Univ, Dept Stat, Rajshahi 6205, Bangladesh
[3] Rajshahi Univ, Dept Appl Math, Rajshahi 6205, Bangladesh
[4] Kyushu Inst Technol, Biomed Informat R&D Ctr, Iizuka, Fukuoka 8208502, Japan
关键词
Gene expression; DNA microarray; Robust complementary hierarchical clustering (RCHC); Maximum beta-likelihood; Relative gene importance; Selection procedure of beta; Robustness; MICROARRAY DATA; SENSITIVITY;
D O I
10.1016/j.jbiosc.2013.03.010
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
A hierarchical clustering (HC) algorithm is one of the most widely used unsupervised statistical techniques for analyzing microarray gene expression data. When applying the HC algorithm to the gene expression data to cluster individuals, most of the HC algorithms generate clusters based on the highly differentially expressed (DE) genes that have very similar expression patterns. These highly DE genes may sometimes be irrelevant in biological processes. The serious problem is that those irrelevant genes with high expressions potentially drown out the low expressed genes that have important biological functions. To overcome the problem, Nowak and Tibshirani proposed the complementary hierarchical clustering (CHC) (Biostatistics, 9, 467-483, 2008). However, it is not robust against outlying expression and often produces misleading results if there exist some contaminations in the gene expression data. Thus, we propose the robust CHC (RCHC) method to robustify the CHC with respect to outliers by maximizing the beta-likelihood function for sequential extraction of a gene-set with proper groups of individuals. Note that the proposed method reduces to the CHC with the tuning parameter beta -> 0. A value of beta plays a key role in the performance of the RCHC method, which controls the tradeoff between the robustness and efficiency of the estimators. Using simulation and real gene expression analysis, the RCHC method shows robust properties to gene expression clustering with respect to data contaminations, overcomes the problem of the CHC, and predicts critically important genes from breast cancer data. (C) 2013, The Society for Biotechnology, Japan. All rights reserved.
引用
收藏
页码:397 / 407
页数:11
相关论文
共 22 条
[1]
Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[2]
A STATISTICAL FRAMEWORK FOR TESTING FUNCTIONAL CATEGORIES IN MICROARRAY DATA [J].
Barry, William T. ;
Nobel, Andrew B. ;
Wright, Fred A. .
ANNALS OF APPLIED STATISTICS, 2008, 2 (01) :286-315
[3]
Robust and efficient estimation by minimising a density power divergence [J].
Basu, A ;
Harris, IR ;
Hjort, NL ;
Jones, MC .
BIOMETRIKA, 1998, 85 (03) :549-559
[4]
Proportion statistics to detect differentially expressed genes: a comparison with log-ratio statistics [J].
Bergemann, Tracy L. ;
Wilson, Jason .
BMC BIOINFORMATICS, 2011, 12
[5]
Empirical Bayes screening of many p-values with applications to microarray studies [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2005, 21 (09) :1987-1994
[6]
An empirical bayes adjustment to increase the sensitivity of detecting differentially expressed genes in microarray experiments [J].
Datta, S ;
Satten, GA ;
Xia, JZ ;
Heslin, MJ ;
Datta, S .
BIOINFORMATICS, 2004, 20 (02) :235-242
[7]
Statistical techniques for microarray data: A partial overview [J].
Datta, S .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2003, 32 (01) :263-280
[8]
Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[9]
Bayesian robust inference for differential gene expression in microarrays with multiple samples [J].
Gottardo, R ;
Raftery, AE ;
Yeung, KY ;
Bumgarner, RE .
BIOMETRICS, 2006, 62 (01) :10-18
[10]
Hastie T, 2001, GENOME BIOL, V2