Finding a maximum likelihood tree is hard

被引：31

作者：

Chor, Benny ^{[1
]}

Tuller, Tamir ^{[1
]}

机构：

[1] Tel Aviv Univ, IL-69978 Tel Aviv, Israel

来源：

JOURNAL OF THE ACM | 2006年 / 53卷 / 05期

关键词：

theory; algorithms; maximum likelihood; tree reconstruction; maximum parsimony; intractability; approximate vertex cover;

D O I：

10.1145/1183907.1183909

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees [Felsenstein 1981]. Finding optimal ML trees appears to be a very hard computational task, but for tractable cases, ML is the method of choice. In particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for the second major character based criterion, maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years [Foulds and Graham, 1982; Day et al. 1986], such a hardness result for ML has so far eluded researchers in the field. An important work by Tuffley and Steel [1997] proves quantitative relations between the parsimony values of given sequences and the corresponding log likelihood values. However, a direct application of their work would only give an exponential time reduction from MP to ML. Another step in this direction has recently been made by Addario-Berry et al. [2004], who proved that ancestral maximum likelihood (AML) is NP-complete. AML "lies in between" the two problems, having some properties of MP and some properties of ML. Still, the AML proof is not directly applicable to the ML problem. We resolve the question, showing that "regular" ML on phylogenetic trees is indeed intractable. Our reduction follows the vertex cover reductions for MP [Day et al. 1986] and AML [Addario-Berry et al. 2004], but its starting point is an approximation version of vertex cover, known as GAP VC. The crux of our work is not the reduction, but its correctness proof. The proof goes through a series of tree modifications, while controlling the likelihood losses at each step, using the bounds of Tuffley and Steel [1997]. The proof can be viewed as correlating the value of any ML solution to an arbitrarily close approximation to vertex cover.

引用

页码：722 / 744

页数：23

共 26 条

[1]

Addario-Berry Louigi, 2004, J Bioinform Comput Biol, V2, P257, DOI 10.1142/S0219720004000557

[2]

[Anonymous], 1971, STAT DECISION THEORY

[3]

BERMAN P, 1999, LECT NOTES COMPUTER

[4] Multiple maxima of likelihood in phylogenetic trees: An analytic approach [J].