Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals

被引:5
作者
Li, Shuying Sue [1 ]
Cheng, Jacob Jen-Hao [1 ]
Zhao, Lue Ping [1 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Div Publ Hlth, Seattle, WA 98104 USA
关键词
D O I
10.1186/1471-2156-8-2
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotypes have been developed, but they have not been evaluated extensively to determine which method not only performs well, but also can be easily incorporated in downstream haplotype-based association analyses. In this paper, we attempt to do so. Our evaluation was carried out by comparing the two leading Bayesian methods, implemented in PHASE and HAPLOTYPER, and the two leading empirical methods, implemented in PL-EM and HPlus. We used these methods to analyze real data, namely the dense genotypes on X-chromosome of 30 European and 30 African trios provided by the International HapMap Project, and simulated genotype data. Our conclusions are based on these analyses. Results: All programs performed very well on X-chromosome data, with an average similarity index of 0.99 and an average prediction rate of 0.99 for both European and African trios. On simulated data with approximation of coalescence, PHASE implementing the Bayesian method based on the coalescence approximation outperformed other programs on small sample sizes. When the sample size increased, other programs performed as well as PHASE. PL-EM and HPlus implementing empirical methods required much less running time than the programs implementing the Bayesian methods. They required only one hundredth or thousandth of the running time required by PHASE, particularly when analyzing large sample sizes and large umber of SNPs. Conclusion: For large sample sizes (hundreds or more), which most association studies require, the two empirical methods might be used since they infer the haplotypes as accurately as any Bayesian methods and can be incorporated easily into downstream haplotype-based analyses such as haplotype-association analyses.
引用
收藏
页数:10
相关论文
共 40 条
[31]   Score tests for association between traits and haplotypes when linkage phase is ambiguous [J].
Schaid, DJ ;
Rowland, CM ;
Tines, DE ;
Jacobson, RM ;
Poland, GA .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (02) :425-434
[32]   A comparison of Bayesian methods for haplotype reconstruction from population genotype data [J].
Stephens, M ;
Donnelly, P .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (05) :1162-1169
[33]   A new statistical method for haplotype reconstruction from population data [J].
Stephens, M ;
Smith, NJ ;
Donnelly, P .
AMERICAN JOURNAL OF HUMAN GENETICS, 2001, 68 (04) :978-989
[34]  
STEPHENS M, PHASE 2 1
[35]   The positive false discovery rate:: A Bayesian interpretation and the q-value [J].
Storey, JD .
ANNALS OF STATISTICS, 2003, 31 (06) :2013-2035
[36]   Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals [J].
Stram, DO ;
Pearce, CL ;
Bretsky, P ;
Freedman, M ;
Hirschhorn, JN ;
Altshuler, D ;
Kolonel, LN ;
Henderson, BE ;
Thomas, DC .
HUMAN HEREDITY, 2003, 55 (04) :179-190
[37]  
WIJSMAN EM, 1987, AM J HUM GENET, V41, P356
[38]   Comparison of haplotype inference methods using genotypic data from unrelated individuals [J].
Xu, HY ;
Wu, XF ;
Spitz, MR ;
Shete, S .
HUMAN HEREDITY, 2004, 58 (02) :63-68
[39]   Haplotype analysis in population genetics and association studies [J].
Zhao, HY ;
Pfeiffer, R ;
Gail, MH .
PHARMACOGENOMICS, 2003, 4 (02) :171-178
[40]   A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies [J].
Zhao, LP ;
Li, SYS ;
Khalid, N .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 72 (05) :1231-1250