A comparison of phasing algorithms for trios and unrelated individuals

被引:228
作者
Marchini, J
Cutler, D
Patterson, N
Stephens, M
Eskin, E
Halperin, E
Lin, S
Qin, ZS
Munro, HM
Abecasis, GR
Donnelly, P
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[2] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Baltimore, MD USA
[3] MIT, Broad Inst, Cambridge, MA 02139 USA
[4] Harvard Univ, Cambridge, MA 02138 USA
[5] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[6] Hebrew Univ Jerusalem, Dept Comp Sci, Jerusalem, Israel
[7] Int Comp Sci Inst, Berkeley, CA USA
[8] Univ Michigan, Dept Biostat, Ctr State Genet, Ann Arbor, MI 48109 USA
关键词
D O I
10.1086/500808
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was similar to 0.8.
引用
收藏
页码:437 / 450
页数:14
相关论文
共 49 条
[1]   Handling marker-marker linkage disequilibrium: Pedigree analysis with clustered markers [J].
Abecasis, GR ;
Wigginton, JE .
AMERICAN JOURNAL OF HUMAN GENETICS, 2005, 77 (05) :754-767
[2]   Haplotypes vs single marker linkage disequilibrium tests:: what do we gain? (Reprinted European Journal of Human Genetics, Vol 4, pg 291-300, 2001) [J].
Akey, Joshua ;
Jin, Li ;
Xiong, Momiao .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2017, 25 :S51-S58
[3]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[4]   Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach [J].
Beerli, P ;
Felsenstein, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (08) :4563-4568
[5]   Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium [J].
Carlson, CS ;
Eberle, MA ;
Rieder, MJ ;
Yi, Q ;
Kruglyak, L ;
Nickerson, DA .
AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (01) :106-120
[6]   Detecting disease associations due to linkage disequilibrium using haplotype tags: A class of tests and the determinants of statistical power [J].
Chapman, JM ;
Cooper, JD ;
Todd, JA ;
Clayton, DG .
HUMAN HEREDITY, 2003, 56 (1-3) :18-31
[7]   Fine genetic mapping using haplotype analysis and the missing data problem [J].
Chiano, MN ;
Clayton, DG .
ANNALS OF HUMAN GENETICS, 1998, 62 :55-60
[8]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[9]   Importance sampling on coalescent histories. II: Subdivided population models [J].
De Iorio, M ;
Griffiths, RC .
ADVANCES IN APPLIED PROBABILITY, 2004, 36 (02) :434-454
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38