Cancer classification using Rotation Forest

被引:122
作者
Liu, Kun-Hong [1 ,2 ]
Huang, De-Shuang [1 ]
机构
[1] Chinese Acad Sci, Intelligent Comp Lab, Hefei Inst Intelligent Machines, Hefei 230031, Anhui, Peoples R China
[2] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
cancer classification; DNA microarray dataset; multiple classifier system (MCS); Rotation Forest; linear transformation method;
D O I
10.1016/j.compbiomed.2008.02.007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We address the microarray dataset based cancer classification using a newly proposed multiple classifier system (MCS), referred to as Rotation Forest. To the best of our knowledge, it is the first time that Rotation Forest has been applied to the microarray dataset classification. In the framework of Rotation Forest, a linear transformation method is required to project data into new feature space for each classifier, and then the base classifiers are trained in different new spaces so as to enhance both the accuracies of base classifiers and the diversity in the ensemble system. Principal component analysis (PCA), non-parametric discriminant analysis (NDA) and random projections (RP) were applied to feature transformation in the original Rotation Forest. In this paper, we use independent component analysis (ICA) as a new transformation method since it can better describe the property of microarray data. The breast cancer dataset and prostate dataset are deployed to validate the efficiency of Rotation Forest. In all the experiments, it can be found that Rotation Forest outperforms other MCSs, such as Bagging and Boosting. In addition, the experimental results also revealed that ICA can further improve the performance of Rotation Forest compared with the original transformation methods. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:601 / 610
页数:10
相关论文
共 24 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], P 14 INT C MACH LEAR
[3]  
[Anonymous], P 18 INT C UNC ART I
[4]   Face recognition by independent component analysis [J].
Bartlett, MS ;
Movellan, JR ;
Sejnowski, TJ .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (06) :1450-1464
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Boosting for tumor classification with gene expression data [J].
Dettling, M ;
Bühlmann, P .
BIOINFORMATICS, 2003, 19 (09) :1061-1069
[9]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[10]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537