Applying compressed sensing to genome-wide association studies

被引:18
作者
Vattikuti, Shashaank [1 ]
Lee, James J. [1 ,2 ,6 ]
Chang, Christopher C. [3 ,6 ]
Hsu, Stephen D. H. [4 ,5 ,6 ]
Chow, Carson C. [1 ]
机构
[1] NIDDK, Math Biol Sect, Lab Biol Modeling, NIH, Bethesda, MD 20814 USA
[2] Univ Minnesota Twin Cities, Dept Psychol, Minneapolis, MN 55455 USA
[3] BGI Hong Kong, Tai Po, Hong Kong, Peoples R China
[4] Michigan State Univ, Dept Phys, E Lansing, MI 48824 USA
[5] Michigan State Univ, Res & Grad Studies, E Lansing, MI 48824 USA
[6] BGI Shenzhen, Cognit Genom Lab, Shenzhen, Peoples R China
来源
GIGASCIENCE | 2014年 / 3卷
关键词
GWAS; Genomic selection; Compressed sensing; Lasso; Underdetermined system; Sparsity; Phase transition; PHASE-TRANSITIONS; MODEL SELECTION; PREDICTION; HERITABILITY; PERFORMANCE; REGRESSION; LINKAGE; LASSO; TOOL;
D O I
10.1186/2047-217X-3-10
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal recovery when the number of predictor variables (i.e., genotyped markers) exceeds the sample size. Its applicability to GWAS has not been investigated. Results: Using CS theory, we show that all markers with nonzero coefficients can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability equal to one (h(2) = 1), there is a sharp phase transition from poor performance to complete selection as the sample size is increased. For heritability below one, complete selection still occurs, but the transition is smoothed. We find for h(2) similar to 0.5 that a sample size of approximately thirty times the number of markers with nonzero coefficients is sufficient for full selection. This boundary is only weakly dependent on the number of genotyped markers. Conclusion: Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region. Given a limited sample size, it is possible to discover a phase transition by increasing the penalization; in this case a subset of the support may be recovered. Applying this approach to the GWAS analysis of height, we show that 70-100% of the selected markers are strongly correlated with height-associated markers identified by the GIANT Consortium.
引用
收藏
页数:17
相关论文
共 55 条
[1]   Performance and Robustness of Penalized and Unpenalized Methods for Genetic Prediction of Complex Human Disease [J].
Abraham, Gad ;
Kowalczyk, Adam ;
Zobel, Justin ;
Inouye, Michael .
GENETIC EPIDEMIOLOGY, 2013, 37 (02) :184-195
[2]   Hundreds of variants clustered in genomic loci and biological pathways affect human height [J].
Allen, Hana Lango ;
Estrada, Karol ;
Lettre, Guillaume ;
Berndt, Sonja I. ;
Weedon, Michael N. ;
Rivadeneira, Fernando ;
Willer, Cristen J. ;
Jackson, Anne U. ;
Vedantam, Sailaja ;
Raychaudhuri, Soumya ;
Ferreira, Teresa ;
Wood, Andrew R. ;
Weyant, Robert J. ;
Segre, Ayellet V. ;
Speliotes, Elizabeth K. ;
Wheeler, Eleanor ;
Soranzo, Nicole ;
Park, Ju-Hyun ;
Yang, Jian ;
Gudbjartsson, Daniel ;
Heard-Costa, Nancy L. ;
Randall, Joshua C. ;
Qi, Lu ;
Smith, Albert Vernon ;
Maegi, Reedik ;
Pastinen, Tomi ;
Liang, Liming ;
Heid, Iris M. ;
Luan, Jian'an ;
Thorleifsson, Gudmar ;
Winkler, Thomas W. ;
Goddard, Michael E. ;
Lo, Ken Sin ;
Palmer, Cameron ;
Workalemahu, Tsegaselassie ;
Aulchenko, Yurii S. ;
Johansson, Asa ;
Zillikens, M. Carola ;
Feitosa, Mary F. ;
Esko, Tonu ;
Johnson, Toby ;
Ketkar, Shamika ;
Kraft, Peter ;
Mangino, Massimo ;
Prokopenko, Inga ;
Absher, Devin ;
Albrecht, Eva ;
Ernst, Florian ;
Glazer, Nicole L. ;
Hayward, Caroline .
NATURE, 2010, 467 (7317) :832-838
[3]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[4]   Robust uncertainty principles:: Exact signal reconstruction from highly incomplete frequency information [J].
Candès, EJ ;
Romberg, J ;
Tao, T .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (02) :489-509
[5]   Stable signal recovery from incomplete and inaccurate measurements [J].
Candes, Emmanuel J. ;
Romberg, Justin K. ;
Tao, Terence .
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2006, 59 (08) :1207-1223
[6]   A Probabilistic and RIPless Theory of Compressed Sensing [J].
Candes, Emmanuel J. ;
Plan, Yaniv .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2011, 57 (11) :7235-7254
[7]   NEAR-IDEAL MODEL SELECTION BY l1 MINIMIZATION [J].
Candes, Emmanuel J. ;
Plan, Yaniv .
ANNALS OF STATISTICS, 2009, 37 (5A) :2145-2177
[8]   Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies [J].
Chatterjee, Nilanjan ;
Wheeler, Bill ;
Sampson, Joshua ;
Hartge, Patricia ;
Chanock, Stephen J. ;
Park, Ju-Hyun .
NATURE GENETICS, 2013, 45 (04) :400-405
[9]   Genome-wide association studies establish that human intelligence is highly heritable and polygenic [J].
Davies, G. ;
Tenesa, A. ;
Payton, A. ;
Yang, J. ;
Harris, S. E. ;
Liewald, D. ;
Ke, X. ;
Le Hellard, S. ;
Christoforou, A. ;
Luciano, M. ;
McGhee, K. ;
Lopez, L. ;
Gow, A. J. ;
Corley, J. ;
Redmond, P. ;
Fox, H. C. ;
Haggarty, P. ;
Whalley, L. J. ;
McNeill, G. ;
Goddard, M. E. ;
Espeseth, T. ;
Lundervold, A. J. ;
Reinvang, I. ;
Pickles, A. ;
Steen, V. M. ;
Ollier, W. ;
Porteous, D. J. ;
Horan, M. ;
Starr, J. M. ;
Pendleton, N. ;
Visscher, P. M. ;
Deary, I. J. .
MOLECULAR PSYCHIATRY, 2011, 16 (10) :996-1005
[10]   Predicting genetic predisposition in humans: the promise of whole-genome markers [J].
de los Campos, Gustavo ;
Gianola, Daniel ;
Allison, David B. .
NATURE REVIEWS GENETICS, 2010, 11 (12) :880-886