Enriched random forests

被引:158
作者
Amaratunga, Dhammika [1 ]
Cabrera, Javier [2 ]
Lee, Yung-Seop [3 ]
机构
[1] Johnson & Johnson PRD LLC, Dept Nonclin Biostat, Raritan, NJ 08869 USA
[2] Rutgers State Univ, Dept Stat, Piscataway, NJ 08854 USA
[3] Dongguk Univ, Dept Stat, Seoul, South Korea
关键词
D O I
10.1093/bioinformatics/btn356
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Although the random forest classification procedure works well in datasets with many features, when the number of features is huge and the percentage of truly informative features is small, such as with DNA microarray data, its performance tends to decline significantly. In such instances, the procedure can be improved by reducing the contribution of trees whose nodes are populated by non-informative features. To some extent, this can be achieved by prefiltering, but we propose a novel, yet simple, adjustment that has demonstrably superior performance: choose the eligible subsets at each node by weighted random sampling instead of simple random sampling, with the weights tilted in favor of the informative features. This results in an enriched random forest. We illustrate the superior performance of this procedure in several actual microarray datasets.
引用
收藏
页码:2010 / 2014
页数:5
相关论文
共 23 条
[1]  
AMARATUNGA D, 2007, STAT BIOPHARMACEUT R
[2]   Microarray learning with ABC [J].
Amaratunga, Dhammika ;
Cabrera, Javier ;
Kovtun, Vladimir .
BIOSTATISTICS, 2008, 9 (01) :128-136
[3]  
[Anonymous], 2004, Exploration and analysis of DNA microarray and protein array data
[4]  
[Anonymous], 2003, RANDOM FORESTS MANUA
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Gene expression profiling of NMU-induced rat mammary tumors: cross species comparison with human breast cancer [J].
Chan, MM ;
Lu, X ;
Merchant, FM ;
Iglehart, JD ;
Miron, PL .
CARCINOGENESIS, 2005, 26 (08) :1343-1353
[8]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[10]   An extensive comparison of recent classification tools applied to microarray data [J].
Lee, JW ;
Lee, JB ;
Park, M ;
Song, SH .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (04) :869-885