Boosting and microarray data

被引:31
作者
Long, PM [1 ]
Vega, VB [1 ]
机构
[1] Genome Inst Singapore, Singapore, Singapore
关键词
supervised learning; classification; boosting; gene expression data; microarray data; bioinformatics;
D O I
10.1023/A:1023937123600
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have found one reason why AdaBoost tends not to perform well on gene expression data, and identified simple modifications that improve its ability to find accurate class prediction rules. These modifications appear especially to be needed when there is a strong association between expression profiles and class designations. Cross-validation analysis of six microarray datasets with different characteristics suggests that, suitably modified, boosting provides competitive classification accuracy in general. Sometimes the goal in a microarray analysis is to find a class prediction rule that is not only accurate, but that depends on the level of expression of few genes. Because boosting makes an effort to find genes that are complementary sources of evidence of the correct classification of a tissue sample, it appears especially useful for such gene-efficient class prediction. This appears particularly to be true when there is a strong association between expression profiles and class designations, which is often the case for example when comparing tumor and normal samples.
引用
收藏
页码:31 / 44
页数:14
相关论文
共 44 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[3]  
[Anonymous], P 9 INT WORKSH MACH
[4]  
Anthony M., 1999, Neural network learning: theoretical foundations, Vfirst
[5]   Tissue classification with gene expression profiles [J].
Ben-Dor, A ;
Bruhn, L ;
Friedman, N ;
Nachman, I ;
Schummer, M ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :559-583
[6]  
Breiman L, 1998, ANN STAT, V26, P801
[7]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[8]  
Dubhashi D, 1998, RANDOM STRUCT ALGOR, V13, P99, DOI 10.1002/(SICI)1098-2418(199809)13:2<99::AID-RSA1>3.0.CO
[9]  
2-M
[10]  
Duda R. O., 2000, Pattern Classification and Scene Analysis, V2nd