Genotype calling and haplotyping in parent-offspring trios

被引:36
作者
Chen, Wei [1 ,2 ]
Li, Bingshan [3 ]
Zeng, Zhen [2 ]
Sanna, Serena [4 ]
Sidore, Carlo [4 ,5 ,6 ]
Busonero, Fabio [4 ,5 ]
Kang, Hyun Min [5 ]
Li, Yun [7 ]
Abecasis, Goncalo R. [5 ]
机构
[1] Univ Pittsburgh, Sch Med, Childrens Hosp Pittsburgh,UPMC, Div Pediat Pulm Med Allergy & Immunol,Dept Pediat, Pittsburgh, PA 15224 USA
[2] Univ Pittsburgh, Sch Publ Hlth, Dept Biostat, Pittsburgh, PA 15224 USA
[3] Vanderbilt Univ, Med Ctr, Dept Mol Physiol & Biophys & Neurol, Ctr Human Genet Res, Nashville, TN 37232 USA
[4] CNR, Ist Ric Genet & Biomed, I-09042 Cagliari, Italy
[5] Univ Michigan, Dept Biostat, Ctr Stat Genet, Ann Arbor, MI 48105 USA
[6] Univ Sassari, Dipartimento Sci Biomed, I-07100 Sardinia, Italy
[7] Univ N Carolina, Dept Biostat, Dept Genet, Chapel Hill, NC 27599 USA
基金
美国国家卫生研究院;
关键词
GENOME-WIDE ASSOCIATION; MISSING HERITABILITY; RARE VARIANTS; SEQUENCE; DISEASES; FORMAT;
D O I
10.1101/gr.142455.112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Emerging sequencing technologies allow common and rare variants to be systematically assayed across the human genome in many individuals. In order to improve variant detection and genotype calling, raw sequence data are typically examined across many individuals. Here, we describe a method for genotype calling in settings where sequence data are available for unrelated individuals and parent-offspring trios and show that modeling trio information can greatly increase the accuracy of inferred genotypes and haplotypes, especially on low to modest depth sequencing data. Our method considers both linkage disequilibrium (LD) patterns and the constraints imposed by family structure when assigning individual genotypes and haplotypes. Using simulations, we show that trios provide higher genotype calling accuracy across the frequency spectrum, both overall and at hard-to-call heterozygous sites. In addition, trios provide greatly improved phasing accuracy-improving the accuracy of downstream analyses (such as genotype imputation) that rely on phased haplotypes. To further evaluate our approach, we analyzed data on the first 508 individuals sequenced by the SardiNIA sequencing project. Our results show that our method reduces the genotyping error rate by 50% compared with analysis using existing methods that ignore family structure. We anticipate our method will facilitate genotype calling and haplotype inference for many ongoing sequencing projects.
引用
收藏
页码:142 / 151
页数:10
相关论文
共 25 条
[11]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[12]  
Li N, 2003, GENETICS, V165, P2213
[13]   Low-coverage sequencing: Implications for design of complex trait association studies [J].
Li, Yun ;
Sidore, Carlo ;
Kang, Hyun Min ;
Boehnke, Michael ;
Abecasis, Goncalo R. .
GENOME RESEARCH, 2011, 21 (06) :940-951
[14]   MaCH: Using Sequence and Genotype Data to Estimate Haplotypes and Unobserved Genotypes [J].
Li, Yun ;
Willer, Cristen J. ;
Ding, Jun ;
Scheet, Paul ;
Abecasis, Goncalo R. .
GENETIC EPIDEMIOLOGY, 2010, 34 (08) :816-834
[15]   Finding the missing heritability of complex diseases [J].
Manolio, Teri A. ;
Collins, Francis S. ;
Cox, Nancy J. ;
Goldstein, David B. ;
Hindorff, Lucia A. ;
Hunter, David J. ;
McCarthy, Mark I. ;
Ramos, Erin M. ;
Cardon, Lon R. ;
Chakravarti, Aravinda ;
Cho, Judy H. ;
Guttmacher, Alan E. ;
Kong, Augustine ;
Kruglyak, Leonid ;
Mardis, Elaine ;
Rotimi, Charles N. ;
Slatkin, Montgomery ;
Valle, David ;
Whittemore, Alice S. ;
Boehnke, Michael ;
Clark, Andrew G. ;
Eichler, Evan E. ;
Gibson, Greg ;
Haines, Jonathan L. ;
Mackay, Trudy F. C. ;
McCarroll, Steven A. ;
Visscher, Peter M. .
NATURE, 2009, 461 (7265) :747-753
[16]   A comparison of phasing algorithms for trios and unrelated individuals [J].
Marchini, J ;
Cutler, D ;
Patterson, N ;
Stephens, M ;
Eskin, E ;
Halperin, E ;
Lin, S ;
Qin, ZS ;
Munro, HM ;
Abecasis, GR ;
Donnelly, P .
AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (03) :437-450
[17]   A new multipoint method for genome-wide association studies by imputation of genotypes [J].
Marchini, Jonathan ;
Howie, Bryan ;
Myers, Simon ;
McVean, Gil ;
Donnelly, Peter .
NATURE GENETICS, 2007, 39 (07) :906-913
[18]   Genome-wide association studies for complex traits: consensus, uncertainty and challenges [J].
McCarthy, Mark I. ;
Abecasis, Goncalo R. ;
Cardon, Lon R. ;
Goldstein, David B. ;
Little, Julian ;
Ioannidis, John P. A. ;
Hirschhorn, Joel N. .
NATURE REVIEWS GENETICS, 2008, 9 (05) :356-369
[19]   The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data [J].
McKenna, Aaron ;
Hanna, Matthew ;
Banks, Eric ;
Sivachenko, Andrey ;
Cibulskis, Kristian ;
Kernytsky, Andrew ;
Garimella, Kiran ;
Altshuler, David ;
Gabriel, Stacey ;
Daly, Mark ;
DePristo, Mark A. .
GENOME RESEARCH, 2010, 20 (09) :1297-1303
[20]   A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION [J].
RABINER, LR .
PROCEEDINGS OF THE IEEE, 1989, 77 (02) :257-286