Reconstruction of large phylogenetic trees: A parallel approach

被引:11
作者
Du, ZH
Lin, F
Roshan, UW
机构
[1] Nanyang Technol Univ, BioInformat Res Ctr, Singapore 639798, Singapore
[2] New Jersey Inst Technol, Dept Comp Sci, Coll Comp Sci, Newark, NJ 07102 USA
关键词
phylogenetic tree; maximum parsimony; parallel; divide-and-conquer; MIMD;
D O I
10.1016/j.compbiolchem.2005.06.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Reconstruction of phylogenetic trees for very large datasets is a known example of a computationally hard problem. In this paper, we present a parallel computing model for the widely used Multiple Instruction Multiple Data (MIMD) architecture. Following the idea of divide-and-conquer, our model adapts the recursive-DCM3 decomposition method [Roshan, U., Moret, B.M.E., Williams, T.L., Wainow, T, 2004a. Performance of suptertree methods on various dtaset decompositions. In: Binida-Emonds, O.R.P. (Eds.), Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 3 of Computational Biology, Kluwer Academics, pp. 301-328; Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T., 2004b. Rec-I-DCM3: A Fast Algorithmic Technique for reconstructing large phylogenetic trees, Proceedings of the IEEE Computational Systems Bioinformatics Conference (ICSB)] to divide datasets into smaller subproblems. It distributes computation load over multiple processors so that each processor constructs subtrees on each subproblem within a batch in parallel. It finally collects the resulting trees and merges them into a supertree. The proposed model is flexible as far as methods for dividing and merging datasets are concerned. We show that our method greatly reduces the computational time of the sequential version of the program. As a case study, our parallel approach only takes 22.1 h on four processors to outperform the best score to date (Found at 123.7 h by the Rec-I-DCM3 program [Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T, 2004a. Performance of suptertree methods on various dtaset decompositions. In: Binida-Emonds, O.R.P. (Eds.), Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 3 of Computational Biology, Kluwer Academics, pp. 301-328; Roshan, U., Morel, B.M.E., Williams, T.L., Warnow, T., 2004b. Rec-I-DCM3: A Fast Algorithmic Technique for reconstructing large phylogenetic trees, Proceedings of the IEEE Computational Systems Bioinformatics Conference (ICSB)] on one dataset. Developed with the standard message-passing library, MPI, the program can be recompiled and run on any MIMD systems. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:273 / 280
页数:8
相关论文
共 27 条
[1]  
[Anonymous], P IEEE COMP SYST BIO
[2]   Weighted neighbor joining: A likelihood-based approach to distance-based phylogeny reconstruction [J].
Bruno, WJ ;
Socci, ND ;
Halpern, AL .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (01) :189-197
[3]   Applied evolution [J].
Bull, JJ ;
Wichman, HA .
ANNUAL REVIEW OF ECOLOGY AND SYSTEMATICS, 2001, 32 :183-217
[4]   A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY [J].
CAMIN, JH ;
SOKAL, RR .
EVOLUTION, 1965, 19 (03) :311-326
[5]   Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle [J].
Desper, R ;
Gascuel, O .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (05) :687-705
[6]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[7]   BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data [J].
Gascuel, O .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (07) :685-695
[8]  
Giribet G., 2005, SYST BIOL, V54, P176, DOI DOI 10.1080/10635150590905830
[9]  
Goloboff PA, 1999, CLADISTICS, V15, P415, DOI 10.1111/j.1096-0031.1999.tb00278.x
[10]   Allocating independent tasks to parallel processors: An experimental study [J].
Hagerup, T .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 47 (02) :185-197