Identification of Association Between Disease and Multiple Markers Via Sparse Partial Least-Squares Regression

被引:15
作者
Chun, Hyonho [1 ]
Ballard, David H. [4 ]
Cho, Judy [2 ,3 ]
Zhao, Hongyu [1 ,2 ]
机构
[1] Yale Univ, Dept Epidemiol & Publ Hlth, New Haven, CT 06511 USA
[2] Yale Univ, Dept Genet, New Haven, CT 06511 USA
[3] Yale Univ, Div Gastroenterol, Dept Med, IBD Ctr, New Haven, CT 06511 USA
[4] Feinstein Inst Med Res, Ctr Genom & Human Genet, Manhasset, NY USA
关键词
multi-marker association study; PLS; SPLS; GWAS; PCA; Crohn's disease; GENOME-WIDE ASSOCIATION; SNPS; GENE;
D O I
10.1002/gepi.20596
中图分类号
Q3 [遗传学];
学科分类号
071007 [遗传学];
摘要
Although genome-wide association studies have led to the identifications of hundreds of genes underlying dozens of traits in recent years, most published studies have primarily used single marker-based analysis. Intuitively, more information may be utilized when multiple markers are jointly analyzed. Therefore, many methods have been proposed in the literature for association analysis between traits and multiple markers. Among these methods, simulation and real data analyses have shown that it is often more effective to reduce the dimensionality of the markers in a region through principal components analysis of all the markers first, and then to perform association analysis between traits and those principal components that account for most of the genetic variations in the region. However, one major limitation of this approach is that the principal components are derived purely from marker genotypes, without consideration of their relevance to traits. Furthermore, these components are constructed as linear combinations of all the markers even when only a limited number are potentially relevant to traits. In this manuscript, we propose the use of sparse partial least-squares regression to derive the components that are linear combinations of only relevant markers. This approach is able to use information from both traits and marker genotypes. Extensive simulations and real data analyses on a Crohn's disease data set suggest the superiority of this approach over existing methods. Genet. Epidemiol. 35: 479-486, 2011. (C) 2011 Wiley-Liss, Inc.
引用
收藏
页码:479 / 486
页数:8
相关论文
共 18 条
[1]
Comparisons of Multi-Marker Association Methods to Detect Association Between a Candidate Region and Disease [J].
Ballard, David H. ;
Cho, Judy ;
Zhao, Hongyu .
GENETIC EPIDEMIOLOGY, 2010, 34 (03) :201-212
[2]
Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease [J].
Barrett, Jeffrey C. ;
Hansoul, Sarah ;
Nicolae, Dan L. ;
Cho, Judy H. ;
Duerr, Richard H. ;
Rioux, John D. ;
Brant, Steven R. ;
Silverberg, Mark S. ;
Taylor, Kent D. ;
Barmada, M. Michael ;
Bitton, Alain ;
Dassopoulos, Themistocles ;
Datta, Lisa Wu ;
Green, Todd ;
Griffiths, Anne M. ;
Kistner, Emily O. ;
Murtha, Michael T. ;
Regueiro, Miguel D. ;
Rotter, Jerome I. ;
Schumm, L. Philip ;
Steinhart, A. Hillary ;
Targan, Stephan R. ;
Xavier, Ramnik J. ;
Libioulle, Cecile ;
Sandor, Cynthia ;
Lathrop, Mark ;
Belaiche, Jacques ;
Dewit, Olivier ;
Gut, Ivo ;
Heath, Simon ;
Laukens, Debby ;
Mni, Myriam ;
Rutgeerts, Paul ;
Van Gossum, Andre ;
Zelenika, Diana ;
Franchimont, Denis ;
Hugot, Jean-Pierre ;
de Vos, Martine ;
Vermeire, Severine ;
Louis, Edouard ;
Cardon, Lon R. ;
Anderson, Carl A. ;
Drummond, Hazel ;
Nimmo, Elaine ;
Ahmad, Tariq ;
Prescott, Natalie J. ;
Onnie, Clive M. ;
Fisher, Sheila A. ;
Marchini, Jonathan ;
Ghori, Jilur .
NATURE GENETICS, 2008, 40 (08) :955-962
[3]
Detecting disease associations due to linkage disequilibrium using haplotype tags: A class of tests and the determinants of statistical power [J].
Chapman, JM ;
Cooper, JD ;
Todd, JA ;
Clayton, DG .
HUMAN HEREDITY, 2003, 56 (1-3) :18-31
[4]
Analysis of multiple SNPs in a candidate gene or region [J].
Chapman, Juliet ;
Whittaker, John .
GENETIC EPIDEMIOLOGY, 2008, 32 (06) :560-566
[5]
Sparse partial least squares regression for simultaneous dimension reduction and variable selection [J].
Chun, Hyonho ;
Keles, Suenduez .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2010, 72 :3-25
[6]
SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[7]
A genome-wide association study identifies IL23R as an inflammatory bowel disease gene [J].
Duerr, Richard H. ;
Taylor, Kent D. ;
Brant, Steven R. ;
Rioux, John D. ;
Silverberg, Mark S. ;
Daly, Mark J. ;
Steinhart, A. Hillary ;
Abraham, Clara ;
Regueiro, Miguel ;
Griffiths, Anne ;
Dassopoulos, Themistocles ;
Bitton, Alain ;
Yang, Huiying ;
Targan, Stephan ;
Datta, Lisa Wu ;
Kistner, Emily O. ;
Schumm, L. Philip ;
Lee, Annette T. ;
Gregersen, Peter K. ;
Barmada, M. Michael ;
Rotter, Jerome I. ;
Nicolae, Dan L. ;
Cho, Judy H. .
SCIENCE, 2006, 314 (5804) :1461-1463
[8]
Testing association between disease and multiple SNPs in a candidate gene [J].
Gauderman, W. James ;
Murcray, Cassandra ;
Gilliland, Frank ;
Conti, David V. .
GENETIC EPIDEMIOLOGY, 2007, 31 (05) :383-395
[9]
Testing against a high dimensional alternative [J].
Goeman, JJ ;
van de Geer, SA ;
van Houwelingen, HC .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 :477-493
[10]
Kwee L C., 2008, The American Journal of Human Genetics, V82