Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling

被引:76
作者
Li, X
Rao, SQ
Wang, YD
Gong, BS
机构
[1] Cleveland Clin Fdn, Dept Mol Cardiol, Cleveland, OH 44195 USA
[2] Harbin Med Univ, Dept Biomed Engn Biomath & Bioinformat, Harbin 150086, Peoples R China
[3] Harbin Inst Technol, Dept Comp Sci, Harbin 150001, Peoples R China
[4] Cleveland Clin Fdn, Dept Cardiovasc Med, Cleveland, OH 44195 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1093/nar/gkh563
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Current applications of microarrays focus on precise classification or discovery of biological types, for example tumor versus normal phenotypes in cancer research. Several challenging scientific tasks in the post-genomic epoch, like hunting for the genes underlying complex diseases from genome-wide gene expression profiles and thereby building the corresponding gene networks, are largely overlooked because of the lack of an efficient analysis approach. We have thus developed an innovative ensemble decision approach, which can efficiently perform multiple gene mining tasks. An application of this approach to analyze two publicly available data sets (colon data and leukemia data) identified 20 highly significant colon cancer genes and 23 highly significant molecular signatures for refining the acute leukemia phenotype, most of which have been verified either by biological experiments or by alternative analysis approaches. Furthermore, the globally optimal gene subsets identified by the novel approach have so far achieved the highest accuracy for classification of colon cancer tissue types. Establishment of this analysis strategy has offered the promise of advancing microarray technology as a means of deciphering the involved genetic complexities of complex diseases.
引用
收藏
页码:2685 / 2694
页数:10
相关论文
共 37 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[3]   A formalism for relevance and its application in feature subset selection [J].
Bell, DA ;
Wang, H .
MACHINE LEARNING, 2000, 41 (02) :175-195
[4]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[5]  
Bo TH, 2002, GENOME BIOL, V3
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Discovering patterns in microarray data [J].
Burke, HB .
MOLECULAR DIAGNOSIS, 2000, 5 (04) :349-357
[9]  
Busson-Le Coniat M, 1999, LEUKEMIA, V13, P302
[10]   Multivariate approach for selecting sets of differentially expressed genes [J].
Chilingaryan, A ;
Gevorgyan, N ;
Vardanyan, A ;
Jones, D ;
Szabo, A .
MATHEMATICAL BIOSCIENCES, 2002, 176 (01) :59-69