Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

被引:323
作者
Delaneau, Olivier [1 ]
Marchini, Jonathan [1 ,2 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[2] Univ Oxford, Ctr Human Genet, Oxford OX3 7BN, England
基金
英国医学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
GENOTYPE IMPUTATION; DISCOVERY; INFERENCE; FRAMEWORK; READS;
D O I
10.1038/ncomms4934
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
引用
收藏
页数:9
相关论文
共 17 条
[1]
An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]
A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals [J].
Browning, Brian L. ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) :210-223
[3]
Haplotype Estimation Using Sequencing Reads [J].
Delaneau, Olivier ;
Howie, Bryan ;
Cox, Anthony J. ;
Zagury, Jean-Francois ;
Marchini, Jonathan .
AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (04) :687-696
[4]
Improved whole-chromosome phasing for disease and population genetic studies [J].
Delaneau, Olivier ;
Zagury, Jean-Francois ;
Marchini, Jonathan .
NATURE METHODS, 2013, 10 (01) :5-6
[5]
Delaneau O, 2012, NAT METHODS, V9, P179, DOI [10.1038/NMETH.1785, 10.1038/nmeth.1785]
[6]
A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[7]
Discovery and genotyping of genome structural polymorphism by sequencing on a population scale [J].
Handsaker, Robert E. ;
Korn, Joshua M. ;
Nemesh, James ;
McCarroll, Steven A. .
NATURE GENETICS, 2011, 43 (03) :269-U126
[8]
Genotype Imputation with Thousands of Genomes [J].
Howie, Bryan ;
Marchini, Jonathan ;
Stephens, Matthew .
G3-GENES GENOMES GENETICS, 2011, 1 (06) :457-469
[9]
A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies [J].
Howie, Bryan N. ;
Donnelly, Peter ;
Marchini, Jonathan .
PLOS GENETICS, 2009, 5 (06)
[10]
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data [J].
Li, Heng .
BIOINFORMATICS, 2011, 27 (21) :2987-2993