Random forests for genomic data analysis

被引:630
作者
Chen, Xi [1 ]
Ishwaran, Hemant
机构
[1] Vanderbilt Univ, Dept Biostat, Nashville, TN 37232 USA
基金
美国国家科学基金会;
关键词
Random forests; Random survival forests; Classification; Prediction; Variable selection; Genomic data analysis; VARIABLE IMPORTANCE MEASURES; MACHINE LEARNING ALGORITHMS; GENE-EXPRESSION DATA; PATHWAY ANALYSIS; PREDICTION; CLASSIFICATION; PERFORMANCE; MODEL; SNPS; ASSOCIATION;
D O I
10.1016/j.ygeno.2012.04.003
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:323 / 329
页数:7
相关论文
共 61 条
[1]   Enriched random forests [J].
Amaratunga, Dhammika ;
Cabrera, Javier ;
Lee, Yung-Seop .
BIOINFORMATICS, 2008, 24 (18) :2010-2014
[2]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[3]  
Biau G, 2008, J MACH LEARN RES, V9, P2015
[4]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[8]   Pathway analysis of single-nucleotide polymorphisms potentially associated with glioblastoma multiforme susceptibility using random forests [J].
Chang, Jeffrey S. ;
Yeh, Ru-Fang ;
Wiencke, John K. ;
Wiemels, Joseph L. ;
Smirnov, Ivan ;
Pico, Alexander R. ;
Tihan, Tarik ;
Patoka, Joe ;
Miike, Rei ;
Sison, Jennette D. ;
Rice, Terri ;
Wrensch, Margaret R. .
CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2008, 17 (06) :1368-1373
[9]   An integrative pathway-based clinical-genomic model for cancer survival prediction [J].
Chen, Xi ;
Wang, Lily ;
Ishwaran, Hemant .
STATISTICS & PROBABILITY LETTERS, 2010, 80 (17-18) :1313-1319
[10]   Detecting gene-gene interactions that underlie human diseases [J].
Cordell, Heather J. .
NATURE REVIEWS GENETICS, 2009, 10 (06) :392-404