Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies

被引:63
作者
Hao, Ke [1 ]
Chudin, Eugene [1 ]
McElwee, Joshua [1 ]
Schadt, Eric E. [1 ]
机构
[1] Rosetta Inpharmat, Dept Genet, Seattle, WA USA
来源
BMC GENETICS | 2009年 / 10卷
关键词
MISSING GENOTYPES; SUBSTRUCTURE; ANCESTRY; COVERAGE; LOCI;
D O I
10.1186/1471-2156-10-27
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Although high-throughput genotyping arrays have made whole-genome association studies (WGAS) feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion. Methods: 384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y) arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on similar to 40,000 liver mRNA expression traits (eQTL analysis). In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs. Results: MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers) was not as successful. It was more challenging to impute genotypes in the African American population, given (1) shorter LD blocks and (2) admixture with Caucasian populations in this population. To address issue (2), we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis-eQTL discoveries detected by various methods can be interpreted as their relative statistical power in the GWAS. In this study, we find that imputation offer modest additional power (by 4%) on top of either Ilmn317K or Ilmn650Y, much less than the power gain from Ilmn317K to Ilmn650Y (13%). Conclusion: Current algorithms can accurately impute genotypes for untyped markers, which enables researchers to pool data between studies conducted using different SNP sets. While genotyping itself results in a small error rate (e.g. 0.5%), imputing genotypes is surprisingly accurate. We found that dense marker sets (e.g. Ilmn650Y) outperform sparser ones ( e. g. Ilmn317K) in terms of imputation yield and accuracy. We also noticed it was harder to impute genotypes for African American samples, partially due to population admixture, although using a pooled reference boosts performance. Interestingly, GWAS carried out using imputed genotypes only slightly increased power on top of assayed SNPs. The reason is likely due to adding more markers via imputation only results in modest gain in genetic coverage, but worsens the multiple testing penalties. Furthermore, cis-eQTL mapping using dense SNP set derived from imputation achieves great resolution, and locate associate peak closer to causal variants than conventional approach.
引用
收藏
页数:10
相关论文
共 28 条
  • [1] Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms
    Anderson, Carl A.
    Pettersson, Fredrik H.
    Barrett, Jeffrey C.
    Zhuang, Joanna J.
    Ragoussis, Jiannis
    Cardon, Lon R.
    Morris, Andrew P.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 83 (01) : 112 - 119
  • [2] Evaluating coverage of genome-wide association studies
    Barrett, Jeffrey C.
    Cardon, Lon R.
    [J]. NATURE GENETICS, 2006, 38 (06) : 659 - 662
  • [3] Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
    Browning, Sharon R.
    Browning, Brian L.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) : 1084 - 1097
  • [4] Cis-acting expression quantitative trait loci in mice
    Doss, S
    Schadt, EE
    Drake, TA
    Lusis, AJ
    [J]. GENOME RESEARCH, 2005, 15 (05) : 681 - 691
  • [5] A second generation human haplotype map of over 3.1 million SNPs
    Frazer, Kelly A.
    Ballinger, Dennis G.
    Cox, David R.
    Hinds, David A.
    Stuve, Laura L.
    Gibbs, Richard A.
    Belmont, John W.
    Boudreau, Andrew
    Hardenbol, Paul
    Leal, Suzanne M.
    Pasternak, Shiran
    Wheeler, David A.
    Willis, Thomas D.
    Yu, Fuli
    Yang, Huanming
    Zeng, Changqing
    Gao, Yang
    Hu, Haoran
    Hu, Weitao
    Li, Chaohua
    Lin, Wei
    Liu, Siqi
    Pan, Hao
    Tang, Xiaoli
    Wang, Jian
    Wang, Wei
    Yu, Jun
    Zhang, Bo
    Zhang, Qingrun
    Zhao, Hongbin
    Zhao, Hui
    Zhou, Jun
    Gabriel, Stacey B.
    Barry, Rachel
    Blumenstiel, Brendan
    Camargo, Amy
    Defelice, Matthew
    Faggart, Maura
    Goyette, Mary
    Gupta, Supriya
    Moore, Jamie
    Nguyen, Huy
    Onofrio, Robert C.
    Parkin, Melissa
    Roy, Jessica
    Stahl, Erich
    Winchester, Ellen
    Ziaugra, Liuda
    Altshuler, David
    Shen, Yan
    [J]. NATURE, 2007, 449 (7164) : 851 - U3
  • [6] The International HapMap Project
    Gibbs, RA
    Belmont, JW
    Hardenbol, P
    Willis, TD
    Yu, FL
    Yang, HM
    Ch'ang, LY
    Huang, W
    Liu, B
    Shen, Y
    Tam, PKH
    Tsui, LC
    Waye, MMY
    Wong, JTF
    Zeng, CQ
    Zhang, QR
    Chee, MS
    Galver, LM
    Kruglyak, S
    Murray, SS
    Oliphant, AR
    Montpetit, A
    Hudson, TJ
    Chagnon, F
    Ferretti, V
    Leboeuf, M
    Phillips, MS
    Verner, A
    Kwok, PY
    Duan, SH
    Lind, DL
    Miller, RD
    Rice, JP
    Saccone, NL
    Taillon-Miller, P
    Xiao, M
    Nakamura, Y
    Sekine, A
    Sorimachi, K
    Tanaka, T
    Tanaka, Y
    Tsunoda, T
    Yoshino, E
    Bentley, DR
    Deloukas, P
    Hunt, S
    Powell, D
    Altshuler, D
    Gabriel, SB
    Qiu, RZ
    [J]. NATURE, 2003, 426 (6968) : 789 - 796
  • [7] Practical Issues in Imputation-Based Association Mapping
    Guan, Yongtao
    Stephens, Matthew
    [J]. PLOS GENETICS, 2008, 4 (12):
  • [8] Incorporating individual error rate into association test of unmatched case-control design
    Hao, K
    Wang, XB
    [J]. HUMAN HEREDITY, 2004, 58 (3-4) : 154 - 163
  • [9] Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies
    Hao, Ke
    Schadt, Eric E.
    Storey, John D.
    [J]. PLOS GENETICS, 2008, 4 (06)
  • [10] Worldwide human relationships inferred from genome-wide patterns of variation
    Li, Jun Z.
    Absher, Devin M.
    Tang, Hua
    Southwick, Audrey M.
    Casto, Amanda M.
    Ramachandran, Sohini
    Cann, Howard M.
    Barsh, Gregory S.
    Feldman, Marcus
    Cavalli-Sforza, Luigi L.
    Myers, Richard M.
    [J]. SCIENCE, 2008, 319 (5866) : 1100 - 1104