Many-core algorithms for statistical phylogenetics

被引:313
作者
Suchard, Marc A. [1 ,2 ,3 ]
Rambaut, Andrew [4 ]
机构
[1] Univ Calif Los Angeles, Dept Biomath, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Biostat, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[4] Univ Edinburgh, Inst Evolutionary Biol, Edinburgh EH9 3JT, Midlothian, Scotland
基金
美国国家卫生研究院;
关键词
CODON-SUBSTITUTION MODELS; MAXIMUM-LIKELIHOOD; NUCLEOTIDE SUBSTITUTION; DNA-SEQUENCES; RECONSTRUCTION; INFERENCE; TREES; GENES; RATES;
D O I
10.1093/bioinformatics/btp244
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Statistical phylogenetics is computationally intensive, resulting in considerable attention meted on techniques for parallelization. Codon-based models allow for independent rates of synonymous and replacement substitutions and have the potential to more adequately model the process of protein-coding sequence evolution with a resulting increase in phylogenetic accuracy. Unfortunately, due to the high number of codon states, computational burden has largely thwarted phylogenetic reconstruction under codon models, particularly at the genomic-scale. Here, we describe novel algorithms and methods for evaluating phylogenies under arbitrary molecular evolutionary models on graphics processing units (GPUs), making use of the large number of processing cores to efficiently parallelize calculations even for large state-size models. Results: We implement the approach in an existing Bayesian framework and apply the algorithms to estimating the phylogeny of 62 complete mitochondrial genomes of carnivores under a 60-state codon model. We see a near 90-fold speed increase over an optimized CPU-based computation and a > 140-fold increase over the currently available implementation, making this the first practical use of codon models for phylogenetic inference over whole mitochondrial or microorganism genomes.
引用
收藏
页码:1370 / 1376
页数:7
相关论文
共 32 条
[21]   High-performance algorithm engineering for computational phylogenetics [J].
Moret, BME ;
Bader, DA ;
Warnow, T .
JOURNAL OF SUPERCOMPUTING, 2002, 22 (01) :99-110
[22]  
MUSE SV, 1994, MOL BIOL EVOL, V11, P715
[23]   An empirical examination of the utility of codon-substitution models in phylogeny reconstruction [J].
Ren, FR ;
Tanaka, H ;
Yang, ZH .
SYSTEMATIC BIOLOGY, 2005, 54 (05) :808-818
[24]   Computational advances in maximum likelihood methods for molecular phylogeny [J].
Schadt, EE ;
Sinsheimer, JS ;
Lange, K .
GENOME RESEARCH, 1998, 8 (03) :222-233
[25]   TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing [J].
Schmidt, HA ;
Strimmer, K ;
Vingron, M ;
von Haeseler, A .
BIOINFORMATICS, 2002, 18 (03) :502-504
[26]   Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences [J].
Shapiro, B ;
Rambaut, A ;
Drummond, AJ .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (01) :7-9
[27]  
Silberstein M., 2008, Proceedings of the 22nd annual international conference on Supercomputing, P309
[28]   RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees [J].
Stamatakis, A ;
Ludwig, T ;
Meier, H .
BIOINFORMATICS, 2005, 21 (04) :456-463
[29]  
YANG ZB, 1994, J MOL EVOL, V39, P105
[30]   Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A [J].
Yang, ZH .
JOURNAL OF MOLECULAR EVOLUTION, 2000, 51 (05) :423-432