Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering

被引:2344
作者
Browning, Sharon R.
Browning, Brian L.
机构
[1] Univ Auckland, Dept Stat, Auckland 1, New Zealand
[2] Univ Auckland, Discipline Nutr, Auckland 1, New Zealand
基金
英国惠康基金;
关键词
D O I
10.1086/521987
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.
引用
收藏
页码:1084 / 1097
页数:14
相关论文
共 32 条
  • [1] A haplotype map of the human genome
    Altshuler, D
    Brooks, LD
    Chakravarti, A
    Collins, FS
    Daly, MJ
    Donnelly, P
    Gibbs, RA
    Belmont, JW
    Boudreau, A
    Leal, SM
    Hardenbol, P
    Pasternak, S
    Wheeler, DA
    Willis, TD
    Yu, FL
    Yang, HM
    Zeng, CQ
    Gao, Y
    Hu, HR
    Hu, WT
    Li, CH
    Lin, W
    Liu, SQ
    Pan, H
    Tang, XL
    Wang, J
    Wang, W
    Yu, J
    Zhang, B
    Zhang, QR
    Zhao, HB
    Zhao, H
    Zhou, J
    Gabriel, SB
    Barry, R
    Blumenstiel, B
    Camargo, A
    Defelice, M
    Faggart, M
    Goyette, M
    Gupta, S
    Moore, J
    Nguyen, H
    Onofrio, RC
    Parkin, M
    Roy, J
    Stahl, E
    Winchester, E
    Ziaugra, L
    Shen, Y
    [J]. NATURE, 2005, 437 (7063) : 1299 - 1320
  • [2] [Anonymous], 1985, Computational Statistics Quarterly, DOI DOI 10.1155/2010/874592
  • [3] Evaluating coverage of genome-wide association studies
    Barrett, Jeffrey C.
    Cardon, Lon R.
    [J]. NATURE GENETICS, 2006, 38 (06) : 659 - 662
  • [4] 2SNP: scalable phasing based on 2-SNP haplotypes
    Brinza, D
    Zelikovsky, A
    [J]. BIOINFORMATICS, 2006, 22 (03) : 371 - 373
  • [5] Efficient multilocus association testing for whole genome association studies using localized haplotype clustering
    Browning, Brian L.
    Browning, Sharon R.
    [J]. GENETIC EPIDEMIOLOGY, 2007, 31 (05) : 365 - 375
  • [6] Multilocus association mapping using variable-length Markov chains
    Browning, Sharon R.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (06) : 903 - 913
  • [7] Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
    Burton, Paul R.
    Clayton, David G.
    Cardon, Lon R.
    Craddock, Nick
    Deloukas, Panos
    Duncanson, Audrey
    Kwiatkowski, Dominic P.
    McCarthy, Mark I.
    Ouwehand, Willem H.
    Samani, Nilesh J.
    Todd, John A.
    Donnelly, Peter
    Barrett, Jeffrey C.
    Davison, Dan
    Easton, Doug
    Evans, David
    Leung, Hin-Tak
    Marchini, Jonathan L.
    Morris, Andrew P.
    Spencer, Chris C. A.
    Tobin, Martin D.
    Attwood, Antony P.
    Boorman, James P.
    Cant, Barbara
    Everson, Ursula
    Hussey, Judith M.
    Jolley, Jennifer D.
    Knight, Alexandra S.
    Koch, Kerstin
    Meech, Elizabeth
    Nutland, Sarah
    Prowse, Christopher V.
    Stevens, Helen E.
    Taylor, Niall C.
    Walters, Graham R.
    Walker, Neil M.
    Watkins, Nicholas A.
    Winzer, Thilo
    Jones, Richard W.
    McArdle, Wendy L.
    Ring, Susan M.
    Strachan, David P.
    Pembrey, Marcus
    Breen, Gerome
    St Clair, David
    Caesar, Sian
    Gordon-Smith, Katherine
    Jones, Lisa
    Fraser, Christine
    Green, Elain K.
    [J]. NATURE, 2007, 447 (7145) : 661 - 678
  • [8] Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium
    Carlson, CS
    Eberle, MA
    Rieder, MJ
    Yi, Q
    Kruglyak, L
    Nickerson, DA
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (01) : 106 - 120
  • [9] HaploRec: efficient and accurate large-scale reconstruction of haplotypes
    Eronen, Lauri
    Geerts, Floris
    Toivonen, Hannu
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [10] EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921