Comparing and aggregating partially resolved trees

被引:22
作者
Bansal, Mukul S. [1 ,2 ]
Dong, Jianrong [1 ]
Fernandez-Baca, David [1 ]
机构
[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
Aggregation; Computational biology; Consensus; Hausdorff distance; Phylogenetic trees; Quartet distance; Triplet distance; QUARTETS MAXCUT; CONSENSUS; ALGORITHM; DISTANCE;
D O I
10.1016/j.tcs.2011.08.027
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Partially-resolved - that is, non-binary - trees arise frequently in the analysis of species evolution. Non-binary nodes, also called multifurcations, must be treated carefully, since they can be interpreted as reflecting either lack of information or actual evolutionary history. While several distance measures exist for comparing trees, none of them deal explicitly with this dichotomy. Here we introduce two kinds of distance measures between rooted and unrooted partially-resolved phylogenetic trees over the same set of species; the measures address multifurcations directly. For rooted trees, the measures are based on the topologies the input trees induce on triplets; that is, on three-element subsets of the set of species. For unrooted trees, the measures are based on quartets (four-element subsets). The first class of measures are parametric distances, where there is a parameter that weighs the difference between an unresolved triplet/quartet topology and a resolved one. The second class of measures are based on the Hausdorff distance, where each tree is viewed as a set of all possible ways in which the tree can be refined to eliminate unresolved nodes. We give efficient algorithms for computing parametric distances and give conditions under which Hausdorff distances can be calculated approximately in polynomial time. Additionally, we (i) derive the expected value of the parametric distance between two random trees, (ii) characterize the conditions under which parametric distances are near-metrics or metrics, (iii) study the computational and algorithmic properties of consensus tree methods based on the measures, and (iv) analyze the interrelationships among Hausdorff and parametric distances. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:6634 / 6652
页数:19
相关论文
共 50 条
[1]   N-TREES AS NESTINGS - COMPLEXITY, SIMILARITY, AND CONSENSUS [J].
ADAMS, EN .
JOURNAL OF CLASSIFICATION, 1986, 3 (02) :299-317
[2]  
Ailon Nir, 2005, P 37 ANN ACM S THEOR, P684, DOI DOI 10.1145/1060590.1060692
[3]  
Allen B. L., 2001, Annals of Combinatorics, V5, P1, DOI [10.1007/s00026-001-8006-8, DOI 10.1007/S00026-001-8006-8]
[4]  
[Anonymous], 1997, ACM SIGACT NEWS
[5]  
Bansal M.S., ARXIV09065089V1
[6]   THE MEDIAN PROCEDURE FOR N-TREES [J].
BARTHELEMY, JP .
JOURNAL OF CLASSIFICATION, 1986, 3 (02) :329-334
[7]  
Bartholdi John, 1989, SOCIAL CHOICE WELFAR, V6
[8]  
Berry V, 1999, LECT NOTES COMPUT SC, V1643, P313
[9]   Multipolar consensus for phylogenetic trees [J].
Bonnard, Cecile ;
Berry, Vincent ;
Lartillot, Nicolas .
SYSTEMATIC BIOLOGY, 2006, 55 (05) :837-843
[10]   Computing the quartet distance between evolutionary trees in time O(n log n) [J].
Brodal, GS ;
Fagerberg, R ;
Pedersen, CNS .
ALGORITHMICA, 2004, 38 (02) :377-395