Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures

被引:73
作者
Stamatakis, Alexandros [1 ]
Ott, Michael [2 ]
机构
[1] Univ Munich, Dept Comp Sci, Exelixis Lab, D-80333 Munich, Germany
[2] Tech Univ Munich, Dept Comp Sci, D-85747 Garching, Germany
关键词
phylogenetic inference; maximum likelihood; RAxML; multi-gene phylogenies; multi-core architectures; OpenMP;
D O I
10.1098/rstb.2008.0163
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on 'gappy' multi-gene alignments. By 'gappy' we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAxML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.
引用
收藏
页码:3977 / 3984
页数:8
相关论文
共 26 条
[1]  
[Anonymous], ADV COMPUTERS COMPUT
[2]  
[Anonymous], 2006, 2006 IEEE ACM INT C, DOI [DOI 10.1145/1233501.1233516, DOI 10.1109/ICCAD.2006.320067]
[3]  
[Anonymous], 2006, GARLI GENETIC ALGORI
[4]   The delayed rise of present-day mammals [J].
Bininda-Emonds, Olaf R. P. ;
Cardillo, Marcel ;
Jones, Kate E. ;
MacPhee, Ross D. E. ;
Beck, Robin M. D. ;
Grenyer, Richard ;
Price, Samantha A. ;
Vos, Rutger A. ;
Gittleman, John L. ;
Purvis, Andy .
NATURE, 2007, 446 (7135) :507-512
[5]  
BLAGOJEVIC F, 2007, P 21 IEEE INT PAR DI
[6]  
Blagojevic F, 2007, PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, P90
[7]   Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems [J].
Blagojevic, Filip ;
Nikolopoulos, Dimitrios S. ;
Stamatakis, Alexandros ;
Antonopoulos, Christos D. ;
Curtis-Maury, Matthew .
PARALLEL COMPUTING, 2007, 33 (10-11) :700-719
[8]  
Charalambous M, 2005, LECT NOTES COMPUT SC, V3746, P415
[9]   Phylogenomics and the reconstruction of the tree of life [J].
Delsuc, F ;
Brinkmann, H ;
Philippe, H .
NATURE REVIEWS GENETICS, 2005, 6 (05) :361-375
[10]   Broad phylogenomic sampling improves resolution of the animal tree of life [J].
Dunn, Casey W. ;
Hejnol, Andreas ;
Matus, David Q. ;
Pang, Kevin ;
Browne, William E. ;
Smith, Stephen A. ;
Seaver, Elaine ;
Rouse, Greg W. ;
Obst, Matthias ;
Edgecombe, Gregory D. ;
Sorensen, Martin V. ;
Haddock, Steven H. D. ;
Schmidt-Rhaesa, Andreas ;
Okusu, Akiko ;
Kristensen, Reinhardt Mobjerg ;
Wheeler, Ward C. ;
Martindale, Mark Q. ;
Giribet, Gonzalo .
NATURE, 2008, 452 (7188) :745-U5