Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data

被引:118
作者
Stamatakis, Alexandros [1 ]
Alachiotis, Nikolaos [1 ]
机构
[1] Tech Univ Munich, Dept Comp Sci, Exelixis Lab 112, D-85748 Garching B Munchen, Germany
关键词
MAXIMUM-LIKELIHOOD; DNA-SEQUENCES; MIXED MODELS; PHYLOGENETICS; EVOLUTION;
D O I
10.1093/bioinformatics/btq205
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The current molecular data explosion poses new challenges for large-scale phylogenomic analyses that can comprise hundreds or even thousands of genes. A property that characterizes phylogenomic datasets is that they tend to be gappy, i.e. can contain taxa with (many and disparate) missing genes. In current phylogenomic analyses, this type of alignment gappyness that is induced by missing data frequently exceeds 90%. We present and implement a generally applicable mechanism that allows for reducing memory footprints of likelihood-based [maximum likelihood (ML) or Bayesian] phylogenomic analyses proportional to the amount of missing data in the alignment. We also introduce a set of algorithmic rules to efficiently conduct tree searches via subtree pruning and re-grafting moves using this mechanism. Results: On a large phylogenomic DNA dataset with 2177 taxa, 68 genes and a gappyness of 90%, we achieve a memory footprint reduction from 9 GB down to 1 GB, a speedup for optimizing ML model parameters of 11, and accelerate the Subtree Pruning Regrafting tree search phase by factor 16. Thus, our approach can be deployed to improve efficiency for the two most important resources, CPU time and memory, by up to one order of magnitude.
引用
收藏
页码:i132 / i139
页数:8
相关论文
共 19 条
[1]  
[Anonymous], 2006, GENETIC ALGORITHM AP
[2]  
BERGER SA, 2009, LNCS IN PRESS
[3]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[4]   Introduction. Statistical and computational challenges in molecular phylogenetics and evolution [J].
Goldman, Nick ;
Yang, Ziheng .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2008, 363 (1512) :3889-3892
[5]   A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood [J].
Guindon, S ;
Gascuel, O .
SYSTEMATIC BIOLOGY, 2003, 52 (05) :696-704
[6]   Assessing the root of bilaterian animals with scalable phylogenomic methods [J].
Hejnol, Andreas ;
Obst, Matthias ;
Stamatakis, Alexandros ;
Ott, Michael ;
Rouse, Greg W. ;
Edgecombe, Gregory D. ;
Martinez, Pedro ;
Baguna, Jaume ;
Bailly, Xavier ;
Jondelius, Ulf ;
Wiens, Matthias ;
Mueller, Werner E. G. ;
Seaver, Elaine ;
Wheeler, Ward C. ;
Martindale, Mark Q. ;
Giribet, Gonzalo ;
Dunn, Casey W. .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2009, 276 (1677) :4261-4270
[7]  
LARTILLOT N, 2007, PHYLOBAYES V2 3
[8]  
Pratas Frederico, 2009, Proceedings of the 2009 International Conference on Parallel Processing (ICPP 2009), P9, DOI 10.1109/ICPP.2009.30
[9]   FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments [J].
Price, Morgan N. ;
Dehal, Paramvir S. ;
Arkin, Adam P. .
PLOS ONE, 2010, 5 (03)
[10]   MrBayes 3: Bayesian phylogenetic inference under mixed models [J].
Ronquist, F ;
Huelsenbeck, JP .
BIOINFORMATICS, 2003, 19 (12) :1572-1574