Algorithms for inferring haplotypes

被引:123
作者
Niu, TH [1 ]
机构
[1] Harvard Univ, Sch Med, Brigham & Womens Hosp, Div Prevent Med,Dept Med, Boston, MA 02115 USA
关键词
haplotype; genotype; phase; single-nucleotide polymorphism; algorithm;
D O I
10.1002/gepi.20024
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Haplotype phase information in diploid organisms provides valuable information on human evolutionary history and may lead to the development of more efficient strategies to identify genetic variants that increase susceptibility to human diseases. Molecular haplotyping methods are labor-intensive, low-throughput, and very costly. Therefore, algorithms based on formal statistical theories were shown to be very effective and cost-efficient for haplotype reconstruction. This review covers 1) population-based haplotype inference methods: Clark's algorithm, expectation-maximization (EM) algorithm, coalescence-based algorithms (pseudo-Gibbs sampler and perfect/imperfect phylogeny), and partition-ligation algorithm implemented by a fully Bayesian model (Haplotyper) or by EM (PLEM); 2) family-based haplotype inference methods; 3) the handling of genotype scoring uncertainties (i.e., genotyping errors and raw two-dimensional genotype scatterplots) in inferring haplotypes; and 4) haplotype inference methods for pooled DNA samples. The advantages and limitations of each algorithm are discussed. By using simulations based on empirical data on the G6PD gene and TNFRSF5 gene, I demonstrate that different algorithms have different degrees of sensitivity to various extents of population diversities and genotyping error rates. Future development of statistical algorithms for addressing haplotype reconstruction will resort more and more to ideas based on combinatorial mathematics, graphical models, and machine learning, and they will have profound impacts on population genetics and genetic epidemiology with the advent of the human HapMap. Genet. Epiderniol. (C) 2004 Wiley-Liss, Inc.
引用
收藏
页码:334 / 347
页数:14
相关论文
共 79 条
[1]   Merlin-rapid analysis of dense genetic maps using sparse gene flow trees [J].
Abecasis, GR ;
Cherny, SS ;
Cookson, WO ;
Cardon, LR .
NATURE GENETICS, 2002, 30 (01) :97-101
[2]   Linkage disequilibrium between polymorphisms in the human TNFRSF1B gene and their association with bone mass in perimenopausal women [J].
Albagha, OME ;
Tasker, PN ;
McGuigan, FEA ;
Reid, DM ;
Ralston, SH .
HUMAN MOLECULAR GENETICS, 2002, 11 (19) :2289-2295
[3]   An SNP map of the human genome generated by reduced representation shotgun sequencing [J].
Altshuler, D ;
Pollara, VJ ;
Cowles, CR ;
Van Etten, WJ ;
Baldwin, J ;
Linton, L ;
Lander, ES .
NATURE, 2000, 407 (6803) :513-516
[4]   DNA pooling in mutation detection with reference to sequence analysis [J].
Amos, CI ;
Frazier, ML ;
Wang, WF .
AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 66 (05) :1689-1692
[5]  
Barratt BJ, 2002, ANN HUM GENET, V66, P393, DOI [10.1046/j.1469-1809.2002.00125.x, 10.1017/S0003480002001252]
[6]   The Bayesian revolution in genetics [J].
Beaumont, MA ;
Rannala, B .
NATURE REVIEWS GENETICS, 2004, 5 (04) :251-261
[7]   Comment on "The Impact of genotyping error on haplotype reconstruction and frequency estimation'' [J].
Becker, T ;
Knapp, M .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2003, 11 (09) :637-637
[8]   Influences of matrix metalloproteinase-3 gene variation on extent of coronary atherosclerosis and risk of myocardial infarction [J].
Beyzade, S ;
Zhang, SL ;
Wong, YK ;
Day, INM ;
Eriksson, P ;
Ye, S .
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2003, 41 (12) :2130-2137
[9]   AN ALMOST LINEAR-TIME ALGORITHM FOR GRAPH REALIZATION [J].
BIXBY, RE ;
WAGNER, DK .
MATHEMATICS OF OPERATIONS RESEARCH, 1988, 13 (01) :99-123
[10]   Clone-based systematic haplotyping (CSH): A procedure for physical haplotyping of whole genomes [J].
Burgtorf, C ;
Kepper, P ;
Hoehe, M ;
Schmitt, C ;
Reinhardt, R ;
Lehrach, H ;
Sauer, S .
GENOME RESEARCH, 2003, 13 (12) :2717-2724