Estimation of haplotype frequencies, linkage-disequilibrium measures, and combination of haplotype copies in each pool by use of pooled DNA data

被引:46
作者
Ito, T
Chiku, S
Inoue, E
Tomita, M
Morisaki, T
Morisaki, H
Kamatani, N
机构
[1] Japan Biol Informat Consortium, Japan Biol Informat Res Ctr, Algorithm Team, Koto Ku, Tokyo 1350064, Japan
[2] Tokyo Womens Med Univ, Inst Rheumatol, Tokyo, Japan
[3] Tokyo Womens Med Univ, Dept Appl Biomed Engn & Sci, Div Genom Med, Tokyo, Japan
[4] Natl Cardiovasc Ctr, Res Inst, Dept Biosci, Osaka, Japan
关键词
D O I
10.1086/346116
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Inference of haplotypes is important for many genetic approaches, including the process of assigning a phenotype to a genetic region. Usually, the population frequencies of haplotypes, as well as the diplotype configuration of each subject, are estimated from a set of genotypes of the subjects in a sample from the population. We have developed an algorithm to infer haplotype frequencies and the combination of haplotype copies in each pool by using pooled DNA data. The input data are the genotypes in pooled DNA samples, each of which contains the quantitative genotype data from one to six subjects. The algorithm infers by the maximum-likelihood method both frequencies of the haplotypes in the population and the combination of haplotype copies in each pool by an expectation-maximization algorithm. The algorithm was implemented in the computer program LDPooled. We also used the bootstrap method to calculate the standard errors of the estimated haplotype frequencies. Using this program, we analyzed the published genotype data for the SAA (n = 156), MTHFR (n = 80), and NAT2 (n = 116) genes, as well as the smoothelin gene (n = 102). Our study has shown that the frequencies of major (frequency >0.1 in a population) haplotypes can be inferred rather accurately from the pooled DNA data by the maximum-likelihood method, although with some limitations. The estimated D and D' values had large variations except when \D\ values were >0.1. The estimated linkage-disequilibrium measure rho(2)for 36 linked loci of the smoothelin gene when one- and two- subject pool protocols were used suggested that the gross pattern of the distribution of the measure can be reproduced using the two-subject pool data.
引用
收藏
页码:384 / 398
页数:15
相关论文
共 27 条
[1]   Association mapping of disease loci, by use of a pooled DNA genomic screen [J].
Barcellos, LF ;
Klitz, W ;
Field, LL ;
Tobias, R ;
Bowcock, AM ;
Wilson, R ;
Nelson, MP ;
Nagatomi, J ;
Thomson, G .
AMERICAN JOURNAL OF HUMAN GENETICS, 1997, 61 (03) :734-747
[2]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[3]   A simple and accurate method for determination of microsatellite total allele content differences between DNA pools [J].
Collins, HE ;
Li, HZ ;
Inda, SE ;
Anderson, J ;
Laiho, K ;
Tuomilehto, J ;
Seldin, MF .
HUMAN GENETICS, 2000, 106 (02) :218-226
[4]  
EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921
[5]  
Fallin D, 2000, AM J HUM GENET, V67, P214
[6]   The structure of haplotype blocks in the human genome [J].
Gabriel, SB ;
Schaffner, SF ;
Nguyen, H ;
Moore, JM ;
Roy, J ;
Blumenstiel, B ;
Higgins, J ;
DeFelice, M ;
Lochner, A ;
Faggart, M ;
Liu-Cordero, SN ;
Rotimi, C ;
Adeyemo, A ;
Cooper, R ;
Ward, R ;
Lander, ES ;
Daly, MJ ;
Altshuler, D .
SCIENCE, 2002, 296 (5576) :2225-2229
[7]   HAPLO - A PROGRAM USING THE EM ALGORITHM TO ESTIMATE THE FREQUENCIES OF MULTISITE HAPLOTYPES [J].
HAWLEY, ME ;
KIDD, KK .
JOURNAL OF HEREDITY, 1995, 86 (05) :409-411
[8]   Loss of information due to ambiguous haplotyping of SNPs [J].
Hodge, SE ;
Boehnke, M ;
Spence, MA .
NATURE GENETICS, 1999, 21 (04) :360-361
[9]  
JUDSON R, 2000, PHARMACOGENOMICS, V1, P5
[10]  
Kitamura Y, 2002, ANN HUM GENET, V66, P183, DOI 10.1017/S0003480002001124