Efficient control of population structure in model organism association mapping

被引:1238
作者
Kang, Hyun Min [2 ]
Zaitlen, Noah A. [3 ]
Wade, Claire M. [4 ,5 ]
Kirby, Andrew [4 ,5 ]
Heckerman, David [6 ]
Daly, Mark J. [4 ,5 ]
Eskin, Eleazar [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Bioinformat Program, La Jolla, CA 92093 USA
[4] Harvard Univ, Broad Inst, Boston, MA 02141 USA
[5] Massachusetts Gen Hosp, Ctr Human Genet Res, Boston, MA 02114 USA
[6] Microsoft Res, Redmond, WA 98052 USA
关键词
D O I
10.1534/genetics.107.080101
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer froth computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association snapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the fruited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.
引用
收藏
页码:1709 / 1723
页数:15
相关论文
共 61 条
[1]  
ANNUGIADO RVP, 2001, EXPT ANIM, V50, P319
[2]   Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes [J].
Aranzana, MJ ;
Kim, S ;
Zhao, KY ;
Bakker, E ;
Horton, M ;
Jakob, K ;
Lister, C ;
Molitor, J ;
Shindo, C ;
Tang, CL ;
Toomajian, C ;
Traw, B ;
Zheng, HG ;
Bergelson, J ;
Dean, C ;
Marjoram, P ;
Nordborg, M .
PLOS GENETICS, 2005, 1 (05) :531-539
[3]   Power of mixed-model QTL mapping from phenotypic, pedigree and marker data in self-pollinated crops [J].
Arbelbide, M ;
Yu, J ;
Bernardo, R .
THEORETICAL AND APPLIED GENETICS, 2006, 112 (05) :876-884
[4]   Effect of within-strain sample size on QTL detection and mapping using recombinant inbred mouse strains [J].
Belknap, JK .
BEHAVIOR GENETICS, 1998, 28 (01) :29-38
[5]   Founder effects in the assessment of HIV polymorphisms and HLA allele associations [J].
Bhattacharya, Tanmoy ;
Daniels, Marcus ;
Heckerman, David ;
Foley, Brian ;
Frahm, Nicole ;
Kadie, Carl ;
Carlson, Jonathan ;
Yusim, Karina ;
McMahon, Ben ;
Gaschen, Brian ;
Mallal, Simon ;
Mullins, James I. ;
Nickle, David C. ;
Herbeck, Joshua ;
Rousseau, Christine ;
Learn, Gerald H. ;
Miura, Toshiyuki ;
Brander, Christian ;
Walker, Bruce ;
Korber, Bette .
SCIENCE, 2007, 315 (5818) :1583-1586
[6]   Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics' [J].
Bystrykh, L ;
Weersing, E ;
Dontje, B ;
Sutton, S ;
Pletcher, MT ;
Wiltshire, T ;
Su, AI ;
Vellenga, E ;
Wang, JT ;
Manly, KF ;
Lu, L ;
Chesler, EJ ;
Alberts, R ;
Jansen, RC ;
Williams, RW ;
Cooke, MP ;
de Haan, G .
NATURE GENETICS, 2005, 37 (03) :225-232
[7]   Leveraging Hierarchical Population Structure in Discrete Association Studies [J].
Carlson, Jonathan ;
Kadie, Carl ;
Mallal, Simon ;
Heckerman, David .
PLOS ONE, 2007, 2 (07)
[8]  
Cervino ACL, 2007, GENETICS, V175, P321, DOI 10.1514/genetics.106.065359
[9]   Likelihood ratio tests in linear mixed models with one variance component [J].
Crainiceanu, CM ;
Ruppert, D .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :165-185
[10]   ESTIMATION IN COVARIANCE COMPONENTS MODELS [J].
DEMPSTER, AP ;
RUBIN, DB ;
TSUTAKAWA, RK .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (374) :341-353