The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases

被引:116
作者
Heidema, A. Geert
Boer, Jolanda Ma
Nagelkerke, Nico
Mariman, Edwin C. M.
van der A, Daphne L.
Feskens, Edith J. M.
机构
[1] Natl Inst Publ Hlth & Environm, Ctr Nutr & Hlth, NL-3720 BA Bilthoven, Netherlands
[2] United Arab Emirates Univ, Dept Community Med, Al Ain, U Arab Emirates
[3] Maastricht Univ, NL-6200 MD Maastricht, Netherlands
[4] Univ Wageningen & Res Ctr, Div Human Nutr, NL-6700 EV Wageningen, Netherlands
关键词
D O I
10.1186/1471-2156-7-23
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.
引用
收藏
页数:15
相关论文
共 44 条
[1]  
[Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
[2]   Controlling the false discovery rate in behavior genetics research [J].
Benjamini, Y ;
Drai, D ;
Elmer, G ;
Kafkafi, N ;
Golani, I .
BEHAVIOURAL BRAIN RESEARCH, 2001, 125 (1-2) :279-284
[3]  
Bishop C. M., 1996, Neural networks for pattern recognition
[4]   Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[5]   Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus [J].
Cho, YM ;
Ritchie, MD ;
Moore, JH ;
Park, JY ;
Lee, KU ;
Shin, HD ;
Lee, HK ;
Park, KS .
DIABETOLOGIA, 2004, 47 (03) :549-554
[6]   An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene interactions on risk of myocardial infarction: The importance of model validation [J].
Coffey, CS ;
Hebert, PR ;
Ritchie, MD ;
Krumholz, HM ;
Gaziano, JM ;
Ridker, PM ;
Brown, NJ ;
Vaughan, DE ;
Moore, JH .
BMC BIOINFORMATICS, 2004, 5 (1)
[7]  
COX D. R., 2000, Theoretical Statistics
[8]   Detecting epistatic interactions contributing to quantitative traits [J].
Culverhouse, R ;
Klein, T ;
Shannon, W .
GENETIC EPIDEMIOLOGY, 2004, 27 (02) :141-152
[9]   A perspective on epistasis: Limits of models displaying no main effect [J].
Culverhouse, R ;
Suarez, BK ;
Lin, J ;
Reich, T .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (02) :461-471
[10]   Glucocorticoid-related genetic susceptibility for Alzheimer's disease [J].
de Quervain, DJF ;
Poirier, R ;
Wollmer, MA ;
Grimaldi, LME ;
Tsolaki, M ;
Streffer, JR ;
Hock, C ;
Nitsch, RM ;
Mohajeri, MH ;
Papassotiropoulos, A .
HUMAN MOLECULAR GENETICS, 2004, 13 (01) :47-52