On the interpretation of bootstrap trees: Appropriate threshold of clade selection and induced gain

被引:112
作者
Berry, V [1 ]
Gascuel, O [1 ]
机构
[1] UNIV MONTPELLIER 2, CNRS, LIRMM, UMR 9928, F-34392 MONTPELLIER 5, FRANCE
关键词
bootstrap method; threshold of clade selection; topological distance; Type I and Type II error; bias/variance compromise; maximum parsimony; neighbor joining; computer simulations;
D O I
10.1093/molbev/13.7.999
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In this study we address the problem of interpreting a bootstrap tree. The main issue is choosing the threshold of clade selection in order to separate reliable clades from unreliable ones, depending on their bootstrap proportion. This threshold depends on the chosen error measure. We investigate error measures that stem from a generalization of Robinson and Foulds' (1981) distance, used to quantify the divergence between the true phylogeny and the estimated trees. We propose two analytical approximations of the optimum threshold of clade selection to interpret (i.e., reduce) the bootstrap tree. We performed extensive simulations along the lines of Kuhner and Felsenstein (1994) using the neighbor-joining and the maximum-parsimony methods. These simulations show that our approximations cause only small losses in quality when compared to the optimum threshold resulting from empirical observation. Next, we measured the error reduction achieved when estimating the true phylogeny by the properly reduced bootstrap tree rather than by the complete original tree, obtained with a classical tree-building method. Our simulations on short sequences show that an error reduction of 39% is achieved with the parsimony method and an error reduction of 33% is achieved with the distance method when the error is measured with the standard Robinson and Foulds distance. The observed error reduction is shown to originate from an important decrease in Type I error (wrong inferences), while Type II error (omitted correct clades) is only slightly increased. Greater error reduction is achieved when shorter sequences are used, and when more importance is given to Type I error than to Type II error. To investigate the causes of error from another point of view, we propose a general decomposition of the error expectation in two terms of bias, and one of variance. Results for these terms show that no fundamental bias is introduced by the bootstrap process, the only source of bias being structural (lack of resolution). Moreover, the variance in the estimations is greatly reduced, providing another explanation for the better results of the reduced bootstrap tree compared with the original tree estimate.
引用
收藏
页码:999 / 1011
页数:13
相关论文
共 18 条
[1]   1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].
EFRON, B .
ANNALS OF STATISTICS, 1979, 7 (01) :1-26
[2]  
Efron B, 1994, INTRO BOOTSTRAP, DOI DOI 10.1201/9780429246593
[3]  
EFRON B, 1995, 179 STANDF U
[4]   CASES IN WHICH PARSIMONY OR COMPATIBILITY METHODS WILL BE POSITIVELY MISLEADING [J].
FELSENSTEIN, J .
SYSTEMATIC ZOOLOGY, 1978, 27 (04) :401-410
[5]   IS THERE SOMETHING WRONG WITH THE BOOTSTRAP ON PHYLOGENIES - A REPLY [J].
FELSENSTEIN, J ;
KISHINO, H .
SYSTEMATIC BIOLOGY, 1993, 42 (02) :193-200
[6]  
FELSENSTEIN J, 1985, EVOLUTION, V39, P783, DOI 10.1111/j.1558-5646.1985.tb00420.x
[7]  
Felsenstein J, 1993, PHYLIP (Phylogeny Inference Package) version 3.5c
[8]   AN EMPIRICAL-TEST OF BOOTSTRAPPING AS A METHOD FOR ASSESSING CONFIDENCE IN PHYLOGENETIC ANALYSIS [J].
HILLIS, DM ;
BULL, JJ .
SYSTEMATIC BIOLOGY, 1993, 42 (02) :182-192
[9]  
KENDALL MG, 1973, ADV THEORY STATISTIC, V2