Performance and Robustness of Penalized and Unpenalized Methods for Genetic Prediction of Complex Human Disease

被引:81
作者
Abraham, Gad [1 ,2 ,3 ]
Kowalczyk, Adam [3 ]
Zobel, Justin [3 ]
Inouye, Michael [1 ,2 ]
机构
[1] Univ Melbourne, Dept Pathol, Parkville, Vic 3010, Australia
[2] Univ Melbourne, Dept Microbiol & Immunol, Parkville, Vic 3010, Australia
[3] Univ Melbourne, NICTA Victoria Res Lab, Dept Comp & Informat Syst, Parkville, Vic 3010, Australia
基金
澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会; 英国惠康基金;
关键词
lasso; sparse penalized methods; SNP; prediction; complex disease; GENOME-WIDE ASSOCIATION; VARIABLE SELECTION; CROSS-VALIDATION; MULTIPLE COMMON; LASSO; REGULARIZATION; VARIANTS; BIAS;
D O I
10.1002/gepi.21698
中图分类号
Q3 [遗传学];
学科分类号
071007 [遗传学];
摘要
A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic-net penalized support-vector machine models, a mixed-effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false-positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome-wide SNP profiles across eight complex diseases within cross-validation, lasso and elastic-net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 42 条
[1]
SparSNP: Fast and memory-efficient analysis of all SNPs for phenotype prediction [J].
Abraham, Gad ;
Kowalczyk, Adam ;
Zobel, Justin ;
Inouye, Michael .
BMC BIOINFORMATICS, 2012, 13
[2]
Stability Selection for Genome-Wide Association [J].
Alexander, David H. ;
Lange, Kenneth .
GENETIC EPIDEMIOLOGY, 2011, 35 (07) :722-728
[3]
A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]
Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[5]
SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression [J].
Ayers, Kristin L. ;
Cordell, Heather J. .
GENETIC EPIDEMIOLOGY, 2010, 34 (08) :879-891
[6]
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[7]
Multiple common variants for celiac disease influencing immune gene expression [J].
Dubois, Patrick C. A. ;
Trynka, Gosia ;
Franke, Lude ;
Hunt, Karen A. ;
Romanos, Jihane ;
Curtotti, Alessandra ;
Zhernakova, Alexandra ;
Heap, Graham A. R. ;
Adany, Roza ;
Aromaa, Arpo ;
Bardella, Maria Teresa ;
van den Berg, Leonard H. ;
Bockett, Nicholas A. ;
de la Concha, Emilio G. ;
Dema, Barbara ;
Fehrmann, Rudolf S. N. ;
Fernandez-Arquero, Miguel ;
Fiatal, Szilvia ;
Grandone, Elvira ;
Green, Peter M. ;
Groen, Harry J. M. ;
Gwilliam, Rhian ;
Houwen, Roderick H. J. ;
Hunt, Sarah E. ;
Kaukinen, Katri ;
Kelleher, Dermot ;
Korponay-Szabo, Ilma ;
Kurppa, Kalle ;
MacMathuna, Padraic ;
Maki, Markku ;
Mazzilli, Maria Cristina ;
McCann, Owen T. ;
Mearin, M. Luisa ;
Mein, Charles A. ;
Mirza, Muddassar M. ;
Mistry, Vanisha ;
Mora, Barbara ;
Morley, Katherine I. ;
Mulder, Chris J. ;
Murray, Joseph A. ;
Nunez, Concepcion ;
Oosterom, Elvira ;
Ophoff, Roel A. ;
Polanco, Isabel ;
Peltonen, Leena ;
Platteel, Mathieu ;
Rybak, Anna ;
Salomaa, Veikko ;
Schweizer, Joachim J. ;
Sperandeo, Maria Pia .
NATURE GENETICS, 2010, 42 (04) :295-U42
[8]
Pathway Analysis of GWAS Provides New Insights into Genetic Susceptibility to 3 Inflammatory Diseases [J].
Eleftherohorinou, Hariklia ;
Wright, Victoria ;
Hoggart, Clive ;
Hartikainen, Anna-Liisa ;
Jarvelin, Marjo-Riitta ;
Balding, David ;
Coin, Lachlan ;
Levin, Michael .
PLOS ONE, 2009, 4 (11)
[9]
Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk [J].
Evans, David M. ;
Visscher, Peter M. ;
Wray, Naomi R. .
HUMAN MOLECULAR GENETICS, 2009, 18 (18) :3525-3531
[10]
Defining the role of the MHC in autoimmunity: A review and pooled analysis [J].
Fernando, Michelle M. A. ;
Stevens, Christine R. ;
Walsh, Emily C. ;
De Jager, Philip L. ;
Goyette, Philippe ;
Plenge, Robert M. ;
Vyse, Timothy J. ;
Rioux, John D. .
PLOS GENETICS, 2008, 4 (04)