Type I error and the power of the s-test:: Old lessons from a new, analytically justified statistical test for phylogenies

被引:3
作者
Antezana, MA [1 ]
Hudson, RR [1 ]
机构
[1] Univ Calif Irvine, Dept Ecol & Evolutionary Biol, Irvine, CA 92717 USA
关键词
analytical; bootstrap; continuity; discreteness; Fisher's exact test; homoplasy; hypergeometric; informative sites; maximum likelihood; parallel changes; phylogeny; power; P-value; statistics; type I error;
D O I
10.1080/106351599260300
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present a new procedure for assessing the statistical significance of the most likely unrooted dichotomous topology inferrable from four DNA sequences. The procedure calculates directly a P-value for the support given to this topology by the informative sites congruent with it, assuming the most likely star topology as the null hypothesis. Informative sites are crucial in the determination of the maximum likelihood dichotomous topology and are therefore an obvious target for a statistical test of phylogenies. Our P-value is the probability of producing through parallel substitutions on the branches of the star topology at least as much support as that given to the maximum likelihood dichotomous topology by the aforementioned informative sites, for any of the three possible dichotomous topologies. The degree of statistical significance is simply the complement of this P-value. Ours is therefore an a posteriori testing approach, in which no dichotomous topology is specified in advance. We implement the test fur the case in which all sites behave identically and the substitution model has a single parameter Under these conditions, the P-value can be easily calculated on the basis of the probabilities of change on the branches of the most likely star topology, because under these assumptions, each site can become informative independently from every other site; accordingly, the total number of informative sites of each kind is binomially distributed. We explore the test's type I error by applying it to data produced in star topologies having all branches equally long, or having two short and two long branches, and various degrees of homoplasy. The test is conservative but we demonstrate, by means of a discreteness correction and progressively assumption-free calculations of the P-values, that (1) the conservativeness is mostly due to the discrete nature of informative sites and (2) the P-values calculated empirically are moreover mostly quite accurate in absolute terms. Applying the test to data produced in dichotomous topologies with increasing internal branch length shows that, despite the test's "conservativeness,'' its power is much higher than that of the bootstray, especially when the relevant informative sites are few.
引用
收藏
页码:300 / 316
页数:17
相关论文
共 12 条
[1]   CASES IN WHICH PARSIMONY OR COMPATIBILITY METHODS WILL BE POSITIVELY MISLEADING [J].
FELSENSTEIN, J .
SYSTEMATIC ZOOLOGY, 1978, 27 (04) :401-410
[2]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[3]  
FELSENSTEIN J, 1985, EVOLUTION, V39, P783, DOI 10.1111/j.1558-5646.1985.tb00420.x
[4]   CONFIDENCE-LIMITS ON PHYLOGENIES WITH A MOLECULAR CLOCK [J].
FELSENSTEIN, J .
SYSTEMATIC ZOOLOGY, 1985, 34 (02) :152-161
[5]  
Felsenstein J, 1993, PHYLIP (Phylogeny Inference Package) version 3.5c
[6]  
JUKES T H, 1969, P21
[7]   WHAT IS THE BOOTSTRAP TECHNIQUE [J].
LI, WH ;
ZHARKIKH, A .
SYSTEMATIC BIOLOGY, 1994, 43 (03) :424-430
[8]  
Maynard Smith J., 1989, EVOLUTIONARY GENETIC
[9]   THE NEIGHBOR-JOINING METHOD - A NEW METHOD FOR RECONSTRUCTING PHYLOGENETIC TREES [J].
SAITOU, N ;
NEI, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1987, 4 (04) :406-425
[10]   MAXIMUM-LIKELIHOOD TREES FROM DNA-SEQUENCES - A PECULIAR STATISTICAL ESTIMATION PROBLEM [J].
YANG, Z ;
GOLDMAN, N ;
FRIDAY, A .
SYSTEMATIC BIOLOGY, 1995, 44 (03) :384-399