Imputation of Missing Genotypes From Sparse to High Density Using Long-Range Phasing

被引:53
作者
Daetwyler, Hans D. [1 ,2 ,3 ,4 ]
Wiggans, George R. [5 ]
Hayes, Ben J. [5 ]
Woolliams, John A. [2 ,3 ]
Goddard, Mike E. [1 ,6 ]
机构
[1] Dept Primary Ind, Biosci Res Div, Bundoora, Vic 3083, Australia
[2] Univ Edinburgh, Roslin Inst, Roslin EH25 9RG, Midlothian, Scotland
[3] Univ Edinburgh, R D SVS, Roslin EH25 9RG, Midlothian, Scotland
[4] Wageningen Univ, Anim Breeding & Genom Ctr, NL-6700 AH Wageningen, Netherlands
[5] ARS, Anim Improvement Programs Lab, USDA, Beltsville, MD 20705 USA
[6] Univ Melbourne, Fac Land & Environm, Parkville, Vic 3010, Australia
基金
英国生物技术与生命科学研究理事会;
关键词
ACCURACY; INFERENCE; HAPLOTYPES; SELECTION; DESCENT; VALUES; MODEL;
D O I
10.1534/genetics.111.128082
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible.
引用
收藏
页码:317 / U1028
页数:19
相关论文
共 33 条
[1]   A high density linkage map of the bovine genome [J].
Arias, Juan A. ;
Keehan, Mike ;
Fisher, Paul ;
Coppieters, Wouter ;
Spelman, Richard .
BMC GENETICS, 2009, 10
[2]   Efficient inference of haplotypes from genotypes on a large animal pedigree [J].
Baruch, E ;
Weller, JI ;
Cohen-Zinder, M ;
Ron, M ;
Seroussi, E .
GENETICS, 2006, 172 (03) :1757-1765
[3]   A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals [J].
Browning, Brian L. ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) :210-223
[4]   In silico method for inferring genotypes in pedigrees [J].
Burdick, Joshua T. ;
Chen, Wei-Min ;
Abecasis, Goncalo R. ;
Cheung, Vivian G. .
NATURE GENETICS, 2006, 38 (09) :1002-1004
[5]   Accuracy of genomic selection using different methods to define haplotypes [J].
Calus, M. P. L. ;
Meuwissen, T. H. E. ;
de Roos, A. P. W. ;
Veerkamp, R. F. .
GENETICS, 2008, 178 (01) :553-561
[6]   THE EVOLUTION OF SEX-CHROMOSOMES [J].
CHARLESWORTH, B .
SCIENCE, 1991, 251 (4997) :1030-1033
[7]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[8]   The Impact of Genetic Architecture on Genome-Wide Evaluation Methods [J].
Daetwyler, Hans D. ;
Pong-Wong, Ricardo ;
Villanueva, Beatriz ;
Woolliams, John A. .
GENETICS, 2010, 185 (03) :1021-1031
[9]   Extensive recombination rate variation in the house mouse species complex inferred from genetic linkage maps [J].
Dumont, Beth L. ;
White, Michael A. ;
Steffy, Brian ;
Wiltshire, Tim ;
Payseur, Bret A. .
GENOME RESEARCH, 2011, 21 (01) :114-125
[10]   GENERAL MODEL FOR GENETIC ANALYSIS OF PEDIGREE DATA [J].
ELSTON, RC ;
STEWART, J .
HUMAN HEREDITY, 1971, 21 (06) :523-&