Understanding the accuracy of statistical haplotype inference with sequence data of known phase

被引:54
作者
Andres, Aida M.
Clark, Andrew G.
Shimmin, Lawrence
Boerwinkle, Eric
Sing, Charles F.
Hixson, James E.
机构
[1] NHGRI, Natl Inst Hlth, Bethesda, MD 20892 USA
[2] Cornell Univ, Dept Mol Biol & Genet, Ithaca, NY 14853 USA
[3] Univ Texas, Ctr Hlth Sci, Ctr Human Genet, Houston, TX USA
[4] Univ Michigan, Dept Human Genet, Ann Arbor, MI 48109 USA
关键词
kallekrein; KLK; haplotype reconstruction; phase; LD;
D O I
10.1002/gepi.20185
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Statistical methods for haplotype inference from multi-site genotypes of unrelated individuals have important application in association studies and population genetics. Understanding the factors that affect the accuracy of this inference is important, but their assessment has been restricted by the limited availability of biological data with known phase. We created hybrid cell lines monosomic for human chromosome 19 and produced single-chromosome complete sequences of a 48 kb genomic region in 39 individuals of African American (AA) and European American (EA) origin. We employ these phase-known genotypes and coalescent simulations to assess the accuracy of statistical haplotype reconstruction by several algorithms. Accuracy of phase inference was considerably low in our biological data even for regions as short as 25-50 kb, suggesting that caution is needed when analyzing reconstructed haplotypes. Moreover, the reliability of estimated confidence in phase inference is not high enough to allow for a reliable incorporation of site-specific uncertainty information in subsequent analyses. We show that, in samples of certain mixed ancestry (AA and EA populations), the most accurate haplotypes are probably obtained when increasing sample size by considering the largest, pooled sample, despite the hypothetical problems associated with pooling across those heterogeneous samples. Strategies to improve confidence in reconstructed haplotypes, and realistic alternatives to the analysis of inferred haplotypes, are discussed.
引用
收藏
页码:659 / 671
页数:13
相关论文
共 45 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   Haploview: analysis and visualization of LD and haplotype maps [J].
Barrett, JC ;
Fry, B ;
Maller, J ;
Daly, MJ .
BIOINFORMATICS, 2005, 21 (02) :263-265
[3]   Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium [J].
Carlson, CS ;
Eberle, MA ;
Rieder, MJ ;
Yi, Q ;
Kruglyak, L ;
Nickerson, DA .
AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (01) :106-120
[4]   Perfect phylogeny haplotyper: haplotype inferral using a tree model [J].
Chung, RH ;
Gusfield, D .
BIOINFORMATICS, 2003, 19 (06) :780-781
[5]   The role of haplotypes in candidate gene studies [J].
Clark, AG .
GENETIC EPIDEMIOLOGY, 2004, 27 (04) :321-333
[6]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[7]   Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase [J].
Clark, AG ;
Weiss, KM ;
Nickerson, DA ;
Taylor, SL ;
Buchanan, A ;
Stengård, J ;
Salomaa, V ;
Vartiainen, E ;
Perola, M ;
Boerwinkle, E ;
Sing, CF .
AMERICAN JOURNAL OF HUMAN GENETICS, 1998, 63 (02) :595-612
[8]   Use of unphased multilocus genotype data in indirect association studies [J].
Clayton, D ;
Chapman, J ;
Cooper, J .
GENETIC EPIDEMIOLOGY, 2004, 27 (04) :415-428
[9]   Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies [J].
Douglas, JA ;
Boehnke, M ;
Gillanders, E ;
Trent, JA ;
Gruber, SB .
NATURE GENETICS, 2001, 28 (04) :361-364
[10]  
EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921