A screening methodology based on Random Forests to improve the detection of gene-gene interactions

被引:42
作者
De Lobel, Lizzy [1 ]
Geurts, Pierre [2 ,3 ]
Baele, Guy [4 ,5 ]
Castro-Giner, Francesc [6 ,7 ,8 ]
Kogevinas, Manolis [6 ,7 ,8 ]
Van Steen, Kristel [9 ,10 ]
机构
[1] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium
[2] Univ Liege, Dept Elect Engn & Comp Sci, Liege, Belgium
[3] Univ Liege, GIGA R, Liege, Belgium
[4] Univ Ghent VIB, Dept Plant Syst Biol, B-9052 Ghent, Belgium
[5] Univ Ghent, Dept Mol Genet, B-9000 Ghent, Belgium
[6] Ctr Res Environm Epidemiol, Barcelona, Spain
[7] Municipal Inst Med Res IMIM Hosp Mar, Barcelona, Spain
[8] CIBERESP, Barcelona, Spain
[9] Univ Liege, Montefiore Inst Bioinformat, Stat Genet GIGA, Liege, Belgium
[10] Ghent Univ Hosp, Ctr Med Genet, B-9000 Ghent, Belgium
关键词
gene-gene interactions; prescreening; Random Forests; Multifactor Dimensionality Reduction; MULTIFACTOR-DIMENSIONALITY REDUCTION; SUSCEPTIBILITY GENES; ASTHMA; EPISTASIS; CLONING;
D O I
10.1038/ejhg.2010.48
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene-gene interactions or epistasis. In an era in which genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene-gene interactions and to weed out false positives and negatives from these results. One solution to the dimensionality problem is to reduce data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction (MDR) method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of MDR in larger data sets and compare it with other approaches that are able to identify gene-gene interactions. Under various interaction models (purely and not purely epistatic), we use a Random Forest (RF)-based prescreening method, before executing MDR, to improve its performance. We find that the power of MDR increases when noisy SNPs are first removed, by creating a collection of candidate markers with RFs. We validate our technique by extensive simulation studies and by application to asthma data from the European Committee of Respiratory Health Study II. European Journal of Human Genetics (2010) 18, 1127-1132; doi: 10.1038/ejhg.2010.48; published online 12 May 2010
引用
收藏
页码:1127 / 1132
页数:6
相关论文
共 19 条
[1]
Expression and function of NPSR1/GPRA in the lung before and after induction of asthma-like disease [J].
Allen, Irving C. ;
Pace, Amy J. ;
Jania, Leigh A. ;
Ledford, Julie G. ;
Latour, Anne M. ;
Snouwaert, John N. ;
Bernier, Virginie ;
Stocco, Rino ;
Therien, Alex G. ;
Koller, Beverly H. .
AMERICAN JOURNAL OF PHYSIOLOGY-LUNG CELLULAR AND MOLECULAR PHYSIOLOGY, 2006, 291 (05) :L1005-L1017
[2]
Positional cloning of a novel gene influencing asthma from Chromosome 2q14 [J].
Allen, M ;
Heinzmann, A ;
Noguchi, E ;
Abecasis, G ;
Broxholme, J ;
Ponting, CP ;
Bhattacharyya, S ;
Tinsley, J ;
Zhang, YM ;
Holt, R ;
Jones, EY ;
Lench, N ;
Carey, A ;
Jones, H ;
Dickens, NJ ;
Dimon, C ;
Nicholls, R ;
Baker, C ;
Xue, LZ ;
Townsend, E ;
Kabesch, M ;
Weiland, SK ;
Carr, D ;
von Mutius, E ;
Adcock, IM ;
Barnes, PJ ;
Lathrop, GM ;
Edwards, M ;
Moffatt, MF ;
Cookson, WOCM .
NATURE GENETICS, 2003, 35 (03) :258-263
[3]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]
The protective effect of farm animal exposure on childhood allergy is modified by NPSR1 polymorphisms [J].
Bruce, S. ;
Nyberg, F. ;
Melen, E. ;
James, A. ;
Pulkkinen, V. ;
Orsmark-Pietras, C. ;
Bergstrom, A. ;
Dahlen, B. ;
Wickman, M. ;
von Mutius, E. ;
Doekes, G. ;
Lauener, R. ;
Riedler, J. ;
Eder, W. ;
van Hage, M. ;
Pershagen, G. ;
Scheynius, A. ;
Kere, J. .
JOURNAL OF MEDICAL GENETICS, 2009, 46 (03) :159-167
[5]
Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[6]
TNFA-3086>A in two international population-based cohorts and risk of asthma [J].
Castro-Giner, F. ;
Kogevinas, M. ;
Maechler, M. ;
de Cid, R. ;
Van Steen, K. ;
Imboden, M. ;
Schindler, C. ;
Berger, W. ;
Gonzalez, J. R. ;
Franklin, K. A. ;
Janson, C. ;
Jarvis, D. ;
Omenaas, Ie. ;
Burney, R. ;
Rochat, T. ;
Estivill, X. ;
Anto, J. M. ;
Wjst, M. ;
Probst-Hensch, N. M. .
EUROPEAN RESPIRATORY JOURNAL, 2008, 32 (02) :350-361
[7]
Exploring the Performance of Multifactor Dimensionality Reduction in Large Scale SNP Studies and in the Presence of Genetic Heterogeneity among Epistatic Disease Models [J].
Edwards, Todd L. ;
Lewis, Kenneth ;
Velez, Digna R. ;
Dudek, Scott ;
Ritchie, Marylyn D. .
HUMAN HEREDITY, 2009, 67 (03) :183-192
[8]
Characterization of a common susceptibility locus for asthma-related traits [J].
Laitinen, T ;
Polvi, A ;
Rydman, P ;
Vendelin, J ;
Pulkkinen, V ;
Salmikangas, P ;
Mäkelä, S ;
Rehn, M ;
Pirskanen, A ;
Rautanen, A ;
Zucchelli, M ;
Gullstén, H ;
Leino, M ;
Alenius, H ;
Petäys, T ;
Haahtela, T ;
Laitinen, A ;
Laprise, C ;
Hudson, TJ ;
Laitinen, LA ;
Kere, J .
SCIENCE, 2004, 304 (5668) :300-304
[9]
Genome-wide strategies for detecting multiple loci that influence complex diseases [J].
Marchini, J ;
Donnelly, P ;
Cardon, LR .
NATURE GENETICS, 2005, 37 (04) :413-417
[10]
*MDR, MDR WIND SOFTW