Rapid evaluation of least-squares and minimum-evolution criteria on phylogenetic trees

被引:37
作者
Bryant, D [1 ]
Waddell, P [1 ]
机构
[1] Univ Canterbury, Biomath Res Ctr, Christchurch 1, New Zealand
关键词
least-squares method; minimum evolution; phylogenetic inference; tree statistics; algorithms;
D O I
10.1093/oxfordjournals.molbev.a025863
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present fast new algorithms for evaluating trees with respect to least squares and minimum evolution (ME), the most commonly used criteria for inferring phylogenetic trees from distance data. The new algorithms include an optimal O(N-2) time algorithm for calculating the edge (branch or internode) lengths on a tree according to ordinary or unweighted least squares (OLS); an O(N-3) time algorithm for edge lengths under weighted least squares (WLS) including the Fitch-Margoliash method; and an optimal O(N-4) time algorithm for generalized least-squares (GLS) edge lengths (where N is the number of taxa in the tree). The ME criterion is based on the sum of edge lengths. Consequently, the edge lengths algorithms presented here lead directly to O(N-2), O(N-3), and O(N-4) time algorithms for ME under OLS, WLS, and GLS, respectively. All of these algorithms are as fast as or faster than any of those previously published, and the algorithms for OLS and GLS are the fastest possible (with respect to order of computational complexity). A major advantage of our new methods is that they are as well adapted to multifurcating trees as they are to binary trees. An optimal algorithm for determining path lengths from a tree with given edge lengths is also developed. This leads to an optimal O(N-2) algorithm for OLS sums of squares evaluation and corresponding O(N-3) and O(N-4) time algorithms for WLS and GLS sums of squares, respectively. The GLS algorithm is time-optimal if the covariance matrix is already inverted. The speed of each algorithm is assessed analytically-the speed increases we calculate are confirmed by the dramatic speed increases resulting from their implementation in PAUP* 4.0. The new algorithms enable far more extensive tree searches and statistical evaluations (e.g., bootstrap, parametric bootstrap, or jackknife) in the same amount of time. Hopefully, the fast algorithms for WLS and GLS will encourage the use of these criteria for evaluating trees and their edge lengths (e.g., for approximate divergence time estimates), since they should be more statistically efficient than OLS.
引用
收藏
页码:1346 / 1359
页数:14
相关论文
共 44 条
[1]  
Agresti A., 1990, Analysis of categorical data
[2]  
[Anonymous], CONCEPTUAL NUMERICAL
[3]   ASYNCHRONOUS DISTANCE BETWEEN HOMOLOGOUS DNA-SEQUENCES [J].
BARRY, D ;
HARTIGAN, JA .
BIOMETRICS, 1987, 43 (02) :261-276
[4]  
Bryant David, 1997, PhD thesis
[5]  
BULMER M, 1991, MOL BIOL EVOL, V8, P868
[6]  
CAVALLISFORZA LL, 1967, EVOLUTION, V21, P550, DOI 10.1111/j.1558-5646.1967.tb03411.x
[8]  
FARRIS J S, 1985, Cladistics, V1, P67, DOI 10.1111/j.1096-0031.1985.tb00411.x
[9]   PHYLOGENIES FROM MOLECULAR SEQUENCES - INFERENCE AND RELIABILITY [J].
FELSENSTEIN, J .
ANNUAL REVIEW OF GENETICS, 1988, 22 :521-565
[10]   CASES IN WHICH PARSIMONY OR COMPATIBILITY METHODS WILL BE POSITIVELY MISLEADING [J].
FELSENSTEIN, J .
SYSTEMATIC ZOOLOGY, 1978, 27 (04) :401-410