Minimum redundancy feature selection from microarray gene expression data

被引:1438
作者
Ding, C [1 ]
Peng, HC [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Lab, NERSC Div, Berkeley, CA 94720 USA
来源
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE | 2003年
关键词
D O I
10.1109/CSB.2003.1227396
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Selecting a small subset of genes out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes We observe that feature sets so obtained have certain redundancy and study methods to minimize it. Feature sets obtained through the minimum redundancy -maximum relevance framework represent broader spectrum of characteristics of phenotypes than those obtained through standard ranking methods; they are more robust, generalize well to unseen data, and lead to significantly improved classifications in extensive experiments on 5 gene expressions data sets.
引用
收藏
页码:523 / 528
页数:6
相关论文
共 28 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], [No title captured]
[4]   Tissue classification with gene expression profiles [J].
Ben-Dor, A ;
Bruhn, L ;
Friedman, N ;
Nachman, I ;
Schummer, M ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :559-583
[5]  
CHENG J, 1999, UAI 99
[6]  
CHERKAUER KJ, 1993, 1ST P INT C INT SYST, P74
[7]  
Ding C.H.Q., 2002, RECOMB, P127
[8]   Multi-class protein fold recognition using support vector machines and neural networks [J].
Ding, CHQ ;
Dubchak, I .
BIOINFORMATICS, 2001, 17 (04) :349-358
[9]  
DUDOIT S, 2000, 576 UC BERK DEP STAT
[10]   Diversity of gene expression in adenocarcinoma of the lung [J].
Garber, ME ;
Troyanskaya, OG ;
Schluens, K ;
Petersen, S ;
Thaesler, Z ;
Pacyna-Gengelbach, M ;
van de Rijn, M ;
Rosen, GD ;
Perou, CM ;
Whyte, RI ;
Altman, RB ;
Brown, PO ;
Botstein, D ;
Petersen, I .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13784-13789