Haplotype Estimation Using Sequencing Reads

被引:289
作者
Delaneau, Olivier [1 ]
Howie, Bryan [2 ]
Cox, Anthony J. [3 ]
Zagury, Jean-Francois [4 ]
Marchini, Jonathan [1 ,5 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[2] Univ Chicago, Dept Human Genet, Chicago, IL 60637 USA
[3] Illumina Cambridge Ltd, UK Computat Biol Grp, Cambridge CB10 1XL, England
[4] Conservatoire Natl Arts & Metiers, Lab Genom Bioinformat & Applicat EA 4627, Chaire Bioinformat, F-75141 Paris 03, France
[5] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
基金
英国医学研究理事会;
关键词
GENOTYPE IMPUTATION; INFERENCE; DISEASE;
D O I
10.1016/j.ajhg.2013.09.002
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data: We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
引用
收藏
页码:687 / 696
页数:10
相关论文
共 18 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]   Testing for Modes of Inheritance Involving Compound Heterozygotes [J].
Bacanu, Silviu-Alin .
GENETIC EPIDEMIOLOGY, 2013, 37 (05) :522-528
[3]   A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals [J].
Browning, Brian L. ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) :210-223
[4]   Haplotype phasing: existing methods and new developments [J].
Browning, Sharon R. ;
Browning, Brian L. .
NATURE REVIEWS GENETICS, 2011, 12 (10) :703-714
[5]   Pacific biosciences sequencing technology for genotyping and variation discovery in human data [J].
Carneiro, Mauricio O. ;
Russ, Carsten ;
Ross, Michael G. ;
Gabriel, Stacey B. ;
Nusbaum, Chad ;
DePristo, Mark A. .
BMC GENOMICS, 2012, 13
[6]   Improved whole-chromosome phasing for disease and population genetic studies [J].
Delaneau, Olivier ;
Zagury, Jean-Francois ;
Marchini, Jonathan .
NATURE METHODS, 2013, 10 (01) :5-6
[7]   Genotype Imputation with Thousands of Genomes [J].
Howie, Bryan ;
Marchini, Jonathan ;
Stephens, Matthew .
G3-GENES GENOMES GENETICS, 2011, 1 (06) :457-469
[8]   Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants [J].
Keinan, Alon ;
Clark, Andrew G. .
SCIENCE, 2012, 336 (6082) :740-743
[9]   Inference of human population history from individual whole-genome sequences [J].
Li, Heng ;
Durbin, Richard .
NATURE, 2011, 475 (7357) :493-U84
[10]   MaCH: Using Sequence and Genotype Data to Estimate Haplotypes and Unobserved Genotypes [J].
Li, Yun ;
Willer, Cristen J. ;
Ding, Jun ;
Scheet, Paul ;
Abecasis, Goncalo R. .
GENETIC EPIDEMIOLOGY, 2010, 34 (08) :816-834