Genotype calling and haplotyping in parent-offspring trios

被引:36
作者
Chen, Wei [1 ,2 ]
Li, Bingshan [3 ]
Zeng, Zhen [2 ]
Sanna, Serena [4 ]
Sidore, Carlo [4 ,5 ,6 ]
Busonero, Fabio [4 ,5 ]
Kang, Hyun Min [5 ]
Li, Yun [7 ]
Abecasis, Goncalo R. [5 ]
机构
[1] Univ Pittsburgh, Sch Med, Childrens Hosp Pittsburgh,UPMC, Div Pediat Pulm Med Allergy & Immunol,Dept Pediat, Pittsburgh, PA 15224 USA
[2] Univ Pittsburgh, Sch Publ Hlth, Dept Biostat, Pittsburgh, PA 15224 USA
[3] Vanderbilt Univ, Med Ctr, Dept Mol Physiol & Biophys & Neurol, Ctr Human Genet Res, Nashville, TN 37232 USA
[4] CNR, Ist Ric Genet & Biomed, I-09042 Cagliari, Italy
[5] Univ Michigan, Dept Biostat, Ctr Stat Genet, Ann Arbor, MI 48105 USA
[6] Univ Sassari, Dipartimento Sci Biomed, I-07100 Sardinia, Italy
[7] Univ N Carolina, Dept Biostat, Dept Genet, Chapel Hill, NC 27599 USA
基金
美国国家卫生研究院;
关键词
GENOME-WIDE ASSOCIATION; MISSING HERITABILITY; RARE VARIANTS; SEQUENCE; DISEASES; FORMAT;
D O I
10.1101/gr.142455.112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Emerging sequencing technologies allow common and rare variants to be systematically assayed across the human genome in many individuals. In order to improve variant detection and genotype calling, raw sequence data are typically examined across many individuals. Here, we describe a method for genotype calling in settings where sequence data are available for unrelated individuals and parent-offspring trios and show that modeling trio information can greatly increase the accuracy of inferred genotypes and haplotypes, especially on low to modest depth sequencing data. Our method considers both linkage disequilibrium (LD) patterns and the constraints imposed by family structure when assigning individual genotypes and haplotypes. Using simulations, we show that trios provide higher genotype calling accuracy across the frequency spectrum, both overall and at hard-to-call heterozygous sites. In addition, trios provide greatly improved phasing accuracy-improving the accuracy of downstream analyses (such as genotype imputation) that rely on phased haplotypes. To further evaluate our approach, we analyzed data on the first 508 individuals sequenced by the SardiNIA sequencing project. Our results show that our method reduces the genotyping error rate by 50% compared with analysis using existing methods that ignore family structure. We anticipate our method will facilitate genotype calling and haplotype inference for many ongoing sequencing projects.
引用
收藏
页码:142 / 151
页数:10
相关论文
共 25 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2002, Mathematical and statistical methods for genetic analysis
[3]   Uncovering the roles of rare variants in common disease through whole-genome sequencing [J].
Cirulli, Elizabeth T. ;
Goldstein, David B. .
NATURE REVIEWS GENETICS, 2010, 11 (06) :415-425
[4]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[5]   VIEWPOINT Missing heritability and strategies for finding the underlying causes of complex disease [J].
Eichler, Evan E. ;
Flint, Jonathan ;
Gibson, Greg ;
Kong, Augustine ;
Leal, Suzanne M. ;
Moore, Jason H. ;
Nadeau, Joseph H. .
NATURE REVIEWS GENETICS, 2010, 11 (06) :446-450
[6]   Potential etiologic and functional implications of genome-wide association loci for human diseases and traits [J].
Hindorff, Lucia A. ;
Sethupathy, Praveen ;
Junkins, Heather A. ;
Ramos, Erin M. ;
Mehta, Jayashri P. ;
Collins, Francis S. ;
Manolio, Teri A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (23) :9362-9367
[7]   SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples [J].
Le, Si Quang ;
Durbin, Richard .
GENOME RESEARCH, 2011, 21 (06) :952-960
[8]   Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data [J].
Li, Bingshan ;
Leal, Suzanne M. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 83 (03) :311-321
[9]   A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families [J].
Li, Bingshan ;
Chen, Wei ;
Zhan, Xiaowei ;
Busonero, Fabio ;
Sanna, Serena ;
Sidore, Carlo ;
Cucca, Francesco ;
Kang, Hyun M. ;
Abecasis, Goncalo R. .
PLOS GENETICS, 2012, 8 (10)
[10]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858