FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix

被引:3802
作者
Price, Morgan N. [1 ,2 ]
Dehal, Paramvir S. [1 ,2 ]
Arkin, Adam P. [1 ,2 ,3 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Lab, Phys Biosci Div, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Lab, Virtual Inst Microbial Stress & Survival, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USA
关键词
minimum evolution; Neighbor-Joining; large phylogenies; NEIGHBOR-JOINING ALGORITHM; MAXIMUM-LIKELIHOOD; PHYLOGENETIC TREES; DATABASE; ACCURATE; RECONSTRUCTION; PRINCIPLE; FAMILIES;
D O I
10.1093/molbev/msp077
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N-2) space and O((NL)-L-2) time, but FastTree requires just O(NLa + N http://www.w3.org/1999) memory and O(N http://www.w3.org/1999 (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
引用
收藏
页码:1641 / 1650
页数:10
相关论文
共 33 条
[1]   The MicrobesOnline web site for comparative genomics [J].
Alm, EJ ;
Huang, KH ;
Price, MN ;
Koche, RP ;
Keller, K ;
Dubchak, IL ;
Arkin, AP .
GENOME RESEARCH, 2005, 15 (07) :1015-1022
[2]  
Bininda-Emonds O R, 2001, Pac Symp Biocomput, P547
[3]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[4]   Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Larsen, N. ;
Rojas, M. ;
Brodie, E. L. ;
Keller, K. ;
Huber, T. ;
Dalevi, D. ;
Hu, P. ;
Andersen, G. L. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) :5069-5072
[5]   Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle [J].
Desper, R ;
Gascuel, O .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (05) :687-705
[6]   Bootstrap confidence levels for phylogenetic trees (vol 93, pg 7085, 1996) [J].
Efron, B ;
Halloran, E ;
Holmes, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (23) :13429-13434
[7]   Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis [J].
Eisen, JA .
GENOME RESEARCH, 1998, 8 (03) :163-167
[8]  
Elias I, 2005, LECT NOTES COMPUT SC, V3580, P1263
[9]   Protein molecular function prediction by Bayesian phylogenomics [J].
Engelhardt, BE ;
Jordan, MI ;
Muratore, KE ;
Brenner, SE .
PLOS COMPUTATIONAL BIOLOGY, 2005, 1 (05) :432-445
[10]   Relaxed neighbor joining: A fast distance-based phylogenetic tree construction method [J].
Evans, J ;
Sheneman, L ;
Foster, J .
JOURNAL OF MOLECULAR EVOLUTION, 2006, 62 (06) :785-792