Boosting and microarray data

被引:31
作者
Long, PM [1 ]
Vega, VB [1 ]
机构
[1] Genome Inst Singapore, Singapore, Singapore
关键词
supervised learning; classification; boosting; gene expression data; microarray data; bioinformatics;
D O I
10.1023/A:1023937123600
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have found one reason why AdaBoost tends not to perform well on gene expression data, and identified simple modifications that improve its ability to find accurate class prediction rules. These modifications appear especially to be needed when there is a strong association between expression profiles and class designations. Cross-validation analysis of six microarray datasets with different characteristics suggests that, suitably modified, boosting provides competitive classification accuracy in general. Sometimes the goal in a microarray analysis is to find a class prediction rule that is not only accurate, but that depends on the level of expression of few genes. Because boosting makes an effort to find genes that are complementary sources of evidence of the correct classification of a tissue sample, it appears especially useful for such gene-efficient class prediction. This appears particularly to be true when there is a strong association between expression profiles and class designations, which is often the case for example when comparing tumor and normal samples.
引用
收藏
页码:31 / 44
页数:14
相关论文
共 44 条
[11]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[12]   BOOSTING A WEAK LEARNING ALGORITHM BY MAJORITY [J].
FREUND, Y .
INFORMATION AND COMPUTATION, 1995, 121 (02) :256-285
[13]  
Freund Y., 1999, Journal of Japanese Society for Artificial Intelligence, V14, P771
[14]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[15]  
Freund Y, 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P45
[16]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[17]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[18]   DECISION THEORETIC GENERALIZATIONS OF THE PAC MODEL FOR NEURAL NET AND OTHER LEARNING APPLICATIONS [J].
HAUSSLER, D .
INFORMATION AND COMPUTATION, 1992, 100 (01) :78-150
[19]   PREDICTING (0,1)-FUNCTIONS ON RANDOMLY DRAWN POINTS [J].
HAUSSLER, D ;
LITTLESTONE, N ;
WARMUTH, MK .
INFORMATION AND COMPUTATION, 1994, 115 (02) :248-292
[20]  
Joachims T, 1999, ADVANCES IN KERNEL METHODS, P169