Ultrafast Approximation for Phylogenetic Bootstrap

被引:3470
作者
Bui Quang Minh [1 ]
Minh Anh Thi Nguyen [2 ]
von Haeseler, Arndt [1 ]
机构
[1] Med Univ Vienna, Univ Vienna, Max F Perutz Labs, Ctr Integrat Bioinformat Vienna, Vienna, Austria
[2] Univ Groningen, Groningen Bioinformat Ctr, Groningen, Netherlands
基金
奥地利科学基金会;
关键词
phylogenetic inference; nonparametric bootstrap; tree reconstruction; maximum likelihood; DNA-SEQUENCES; TREE-SPACE; MODEL; EVOLUTION; INFERENCE; PROTEIN; PERFORMANCE; SATURATION; CONFIDENCE; ALGORITHM;
D O I
10.1093/molbev/mst024
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference.
引用
收藏
页码:1188 / 1195
页数:8
相关论文
共 51 条
[41]   Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics [J].
Suzuki, Y ;
Glazko, GV ;
Nei, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (25) :16138-16143
[42]   Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes [J].
Van de Peer, Y ;
Frickey, T ;
Taylor, JS ;
Meyer, A .
GENE, 2002, 295 (02) :205-211
[43]   IQPNNI: Moving fast through tree space and stopping in time [J].
Vinh, LS ;
von Haeseler, A .
MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (08) :1565-1571
[44]  
Waddell Peter J, 2002, Genome Inform, V13, P82
[45]   Testing substitution models within a phylogenetic tree [J].
Weiss, G ;
von Haeseler, A .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (04) :572-578
[46]   A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach [J].
Whelan, S ;
Goldman, N .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (05) :691-699
[47]   PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees [J].
Whelan, Simon ;
de Bakker, Paul I. W. ;
Quevillon, Emmanuel ;
Rodriguez, Nicolas ;
Goldman, Nick .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D327-D331
[48]   The Prevalence of Multifurcations in Tree-space and Their Implications for Tree-search [J].
Whelan, Simon ;
Money, Daniel .
MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (12) :2674-2677
[49]   An index of substitution saturation and its application [J].
Xia, XH ;
Xie, Z ;
Salemi, M ;
Chen, L ;
Wang, Y .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2003, 26 (01) :1-7
[50]   Bayesian phylogenetic inference using DNA sequences: A Markov Chain Monte Carlo method [J].
Yang, ZH ;
Rannala, B .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (07) :717-724