Estimating haplotype frequencies and standard errors for multiple single nucleotide polymorphisms

被引:64
作者
Li, SSY
Khalid, N
Carlson, C
Zhao, LP
机构
[1] Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, Seattle, WA 98109 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
关键词
estimating equation; haplotype; Hardy-Weinberg equilibrium; single nucleotide polymorphism (SNP);
D O I
10.1093/biostatistics/4.4.513
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Estimating haplotype frequencies becomes increasingly important in the mapping of complex disease genes, as millions of single nucleotide polymorphisms (SNPs) are being identified and genotyped. When genotypes at multiple SNP loci are gathered from unrelated individuals, haplotype frequencies can be accurately estimated using expectation-maximization (EM) algorithms (Excoffier and Slatkin, 1995; Hawley and Kidd, 1995; Long et al., 1995), with standard errors estimated using bootstraps. However, because the number of possible haplotypes increases exponentially with the number of SNPs, handling data with a large number of SNPs poses a computational challenge for the EM methods and for other haplotype inference methods. To solve this problem, Niu and colleagues, in their Bayesian haplotype inference paper (Niu et al., 2002), introduced a computational algorithm called progressive ligation (PL). But their Bayesian method has a limitation on the number of subjects (no more than 100 subjects in the current implementation of the method). In this paper, we propose a new method in which we use the same likelihood formulation as in Excoffier and Slatkin's EM algorithm and apply the estimating equation idea and the PL computational algorithm with some modifications. Our proposed method can handle data sets with large number of SNPs as well as large numbers of subjects. Simultaneously, our method estimates standard errors efficiently, using the sandwich-estimate from the estimating equation, rather than the bootstrap method. Additionally, our method admits missing data and produces valid estimates of parameters and their standard errors under the assumption that the missing genotypes are missing at random in the sense defined by Rubin (1976).
引用
收藏
页码:513 / 522
页数:10
相关论文
共 23 条
[1]   Characterization of single-nucleotide polymorphisms in coding regions of human genes [J].
Cargill, M ;
Altshuler, D ;
Ireland, J ;
Sklar, P ;
Ardlie, K ;
Patil, N ;
Lane, CR ;
Lim, EP ;
Kalyanaraman, N ;
Nemesh, J ;
Ziaugra, L ;
Friedland, L ;
Rolfe, A ;
Warrington, J ;
Lipshutz, R ;
Daley, GQ ;
Lander, ES .
NATURE GENETICS, 1999, 22 (03) :231-238
[2]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[3]  
Excoffier, 2000, ARLEQUIN VERSION 2 0
[4]  
EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921
[5]   Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data [J].
Fallin, D ;
Schork, NJ .
AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (04) :947-959
[6]   Islands of linkage disequilibrium [J].
Goldstein, DB .
NATURE GENETICS, 2001, 29 (02) :109-111
[7]  
GREEN ED, 1998, GENETIC BASIS HUMAN, P33
[8]   Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis [J].
Halushka, MK ;
Fan, JB ;
Bentley, K ;
Hsie, L ;
Shen, NP ;
Weder, A ;
Cooper, R ;
Lipshutz, R ;
Chakravarti, A .
NATURE GENETICS, 1999, 22 (03) :239-247
[9]   HAPLO - A PROGRAM USING THE EM ALGORITHM TO ESTIMATE THE FREQUENCIES OF MULTISITE HAPLOTYPES [J].
HAWLEY, ME ;
KIDD, KK .
JOURNAL OF HEREDITY, 1995, 86 (05) :409-411
[10]   Generating samples under a Wright-Fisher neutral model of genetic variation [J].
Hudson, RR .
BIOINFORMATICS, 2002, 18 (02) :337-338