SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression

被引:167
作者
Ayers, Kristin L. [1 ]
Cordell, Heather J. [1 ]
机构
[1] Inst Human Genet, Newcastle Upon Tyne NE1 3BZ, Tyne & Wear, England
基金
英国惠康基金;
关键词
penalized likelihood; Lasso; elastic net; association analysis; DANTZIG SELECTOR; LEAST ANGLE; ASSOCIATION; SUSCEPTIBILITY; LASSO; REGULARIZATION; REPLICATION; SHRINKAGE; LOCUS;
D O I
10.1002/gepi.20543
中图分类号
Q3 [遗传学];
学科分类号
071007 [遗传学];
摘要
Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-gamma shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for. Genet. Epidemiol. 34:879-891, 2010. (C) 2010 Wiley-Liss, Inc.
引用
收藏
页码:879 / 891
页数:13
相关论文
共 46 条
[1]
Rentapping the insulin gene/IDDM2 locus in type 1 diabetes [J].
Barratt, BJ ;
Payne, F ;
Lowe, CE ;
Hermann, R ;
Healy, BC ;
Harold, D ;
Concannon, P ;
Gharani, N ;
McCarthy, MI ;
Olavesen, MG ;
McCormack, R ;
Guja, C ;
Ionescu-Tîrgoviste, C ;
Undlien, DE ;
Ronningen, KS ;
Gillespie, KM ;
Tuomilehto-Wolf, E ;
Tuomilehto, J ;
Bennett, ST ;
Clayton, DG ;
Cordell, HJ ;
Todd, JA .
DIABETES, 2004, 53 (07) :1884-1889
[2]
Breheny P, 2008, 393 U IOWA DEP STAT
[3]
BETTER SUBSET REGRESSION USING THE NONNEGATIVE GARROTE [J].
BREIMAN, L .
TECHNOMETRICS, 1995, 37 (04) :373-384
[4]
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[5]
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[6]
Fregene: Simulation of realistic sequence-level data in populations and ascertained samples [J].
Chadeau-Hyam, Marc ;
Hoggart, Clive J. ;
O'Reilly, Paul F. ;
Whittaker, John C. ;
De Iorio, Maria ;
Balding, David J. .
BMC BIOINFORMATICS, 2008, 9 (1)
[7]
Charoen Pimphen, 2007, BMC Proc, V1 Suppl 1, pS23
[8]
Cho Seoae, 2009, BMC Proc, V3 Suppl 7, pS25
[9]
A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data:: Application to HLA in type 1 diabetes [J].
Cordell, HJ ;
Clayton, DG .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (01) :124-141
[10]
Croiseau Pascal, 2009, BMC Proc, V3 Suppl 7, pS61