Accuracy of coalescent likelihood estimates: Do we need more sites, more sequences, or more loci?

被引:244
作者
Felsenstein, J [1 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Dept Biol, Seattle, WA 98195 USA
关键词
coalescent; maximum likelihood; population size; sampling design;
D O I
10.1093/molbev/msj079
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A computer simulation study has been made of the accuracy of estimates of Theta = 4N(e)mu from a sample from a single isolated population of finite size. The accuracies turn out to be well predicted by a formula developed by Fu and Li, who used optimistic assumptions. Their formulas are restated in terms of accuracy, defined here as the reciprocal of the squared coefficient of variation. This should be proportional to sample size when the entities sampled provide independent information. Using these formulas for accuracy, the sampling strategy for estimation of Theta can be investigated. Two models for cost have been used, a cost-per-base model and a cost-per-read model. The former would lead us to prefer to have a very large number of loci, each one base long. The latter, which is more realistic, causes us to prefer to have one read per locus and an optimum sample size which declines as costs of sampling organisms increase. For realistic values, the optimum sample size is 8 or fewer individuals. This is quite close to the results obtained by Pluzhnikov and Donnelly for a cost-per-base model, evaluating other estimators of Theta It can be understood by considering that the resources spent collecting larger samples prevent us from considering more loci. An examination of the efficiency of Watterson's estimator of Theta was also made, and it was found to be reasonably efficient when the number of mutants per generation in the sequence in the whole population is less than 2.5.
引用
收藏
页码:691 / 700
页数:10
相关论文
共 24 条
[1]   Inference from gene trees in a subdivided population [J].
Bahlo, M ;
Griffiths, RC .
THEORETICAL POPULATION BIOLOGY, 2000, 57 (02) :79-95
[2]  
Beerli P, 1999, GENETICS, V152, P763
[3]   Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach [J].
Beerli, P ;
Felsenstein, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (08) :4563-4568
[4]  
EDWARDS AWF, 1970, J ROY STAT SOC B, V32, P155
[5]   PHYLOGENIES FROM MOLECULAR SEQUENCES - INFERENCE AND RELIABILITY [J].
FELSENSTEIN, J .
ANNUAL REVIEW OF GENETICS, 1988, 22 :521-565
[6]   ESTIMATING EFFECTIVE POPULATION-SIZE FROM SAMPLES OF SEQUENCES - INEFFICIENCY OF PAIRWISE AND SEGREGATING SITES AS COMPARED TO PHYLOGENETIC ESTIMATES [J].
FELSENSTEIN, J .
GENETICS RESEARCH, 1992, 59 (02) :139-147
[7]   ESTIMATING EFFECTIVE POPULATION-SIZE FROM SAMPLES OF SEQUENCES - A BOOTSTRAP MONTE-CARLO INTEGRATION METHOD [J].
FELSENSTEIN, J .
GENETICS RESEARCH, 1992, 60 (03) :209-220
[8]  
FU YX, 1994, GENETICS, V136, P685
[9]  
FU YX, 1993, GENETICS, V133, P693
[10]   ANCESTRAL INFERENCE IN POPULATION-GENETICS [J].
GRIFFITHS, RC ;
TAVARE, S .
STATISTICAL SCIENCE, 1994, 9 (03) :307-319