Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models

被引:37
作者
Bansal, Mukul S. [2 ]
Burleigh, J. Gordon [3 ]
Eulenstein, Oliver [1 ]
机构
[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[2] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
[3] Univ Florida, Dept Biol, Gainesville, FL 32611 USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
美国国家科学基金会;
关键词
GENE TREES; MAXIMUM-LIKELIHOOD; SPECIES TREES; RECONSTRUCTION; INCONGRUENCE; HEURISTICS; LINEAGE;
D O I
10.1186/1471-2105-11-S1-S42
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplication-loss or deep coalescence reconciliation cost are prohibitively slow for large datasets. Results: We describe novel algorithms for SPR and TBR based local search heuristics under the duplication-loss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplication-loss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three large-scale genomic data sets. Conclusion: Our new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication-loss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genome-scale phylogenetic analyses.
引用
收藏
页数:9
相关论文
共 36 条
[1]   Simultaneous Bayesian gene tree reconstruction and reconciliation analysis [J].
Akerborg, Oerjan ;
Sennblad, Bengt ;
Arvestad, Lars ;
Lagergren, Jens .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (14) :5714-5719
[2]   Bayesian estimation of concordance among gene trees (vol 24, pg 412, 2007) [J].
Ane, Cecile ;
Larget, Bret ;
Baum, David A. ;
Smith, Stacey D. ;
Rokas, Antonis .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (07) :1575-1575
[3]  
ARVESTAD L, 2003, ISMB BIOINFORMATIC S, P7
[4]  
BANSAL MS, 2007, HEURISTICS GENEDUPLI, P238
[5]  
BANSAL MS, 2009, THESIS IOWA STATE U
[6]   An Ω (n2/log n) Speed-Up of TBR Heuristics for the Gene-Duplication Problem [J].
Bansal, Mukul S. ;
Eulenstein, Oliver .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2008, 5 (04) :514-524
[7]   Respirator Physiological Effects under Simulated Work Conditions [J].
Bansal, Siddharth ;
Harber, Philip ;
Yun, David ;
Liu, David ;
Liu, Yihang ;
Wu, Samantha ;
Ng, David ;
Santiago, Silverio .
JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE, 2009, 6 (04) :221-227
[8]   Reconciling a gene tree to a species tree under the duplication cost model [J].
Bonizzoni, P ;
Della Vedova, G ;
Dondi, R .
THEORETICAL COMPUTER SCIENCE, 2005, 347 (1-2) :36-53
[9]  
Bordewich M., 2005, Ann Comb, V8, P409, DOI [DOI 10.1007/S00026-004-0229-Z, 10.1007/s00026-004-0229-z]
[10]  
BURLEIGH JG, SYSTEMATIC IN PRESS