ON THE APPROXIMATION OF SHORTEST COMMON SUPERSEQUENCES AND LONGEST COMMON SUBSEQUENCES

被引:124
作者
JIANG, T [1 ]
LI, M [1 ]
机构
[1] UNIV WATERLOO, DEPT COMP SCI, WATERLOO, ON N3L 3G1, CANADA
关键词
SHORTEST COMMON SUPERSEQUENCE; LONGEST COMMON SUBSEQUENCE; APPROXIMATION ALGORITHM; NP-HARDNESS; AVERAGE-CASE ANALYSIS; RANDOM SEQUENCE;
D O I
10.1137/S009753979223842X
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problems of finding shortest common supersequences (SCS) and longest common subsequences (LCS) are two well-known NP-hard problems that have applications in many areas, including computational molecular biology, data compression, robot motion planning, and scheduling, text editing, etc. A lot of fruitless effort has been spent in searching for good approximation algorithms for these problems. In this paper, we show that these problems are inherently hard to approximate in the worst case. In particular, we prove that (i) SCS does not have a polynomial-rime linear approximation algorithm unless P = NP; (ii) There exists a constant delta > 0 such that, if SCS has a polynomial-time approximation algorithm with ratio log(delta) n, where ii is the number of input sequences, then NP is contained in DTIME((2Polylog) (n));(iii) There exists a constant delta > 0 such that, if LCS has a polynomial-time approximation algorithm with performance ratio n(d)elta, then P = NP. The proofs utilize the recent results of Arora et al. [Proc. 23rd IEEE Symposium oil Foundations of Computer Science, 1992, pp. 14-23] on the complexity of approximation problems. In the second part of the paper, we introduce a new method for analyzing the average-case performance of algorithms for sequences, based on Kolmogorov complexity. Despite the above nonapproximability results, we show that near optimal solutions for both SCS and LCS can be found on the average. More precisely, consider a fixed alphabet Sigma and suppose that the input sequences are generated randomly according to the uniform probability distribution and are of the same length n. Moreover, assume that the number of input sequences is polynomial in n. Then, there are simple greedy algorithms which approximate SCS and LCS with expected additive errors O(n(0.707)) and O(n(1/2+epsilon)) for epsilon > 0, respectively. Incidentally, our analyses also provide tight upper and lower bounds on the expected LCS and SCS lengths for a set of random sequences solving a generalization of another well-known open question on the expected LCS length for two random sequences [K. Alexander, The rare of convergence of the mean length of the longest common subsequence, 1992, manuscript], [V. Chvatal and D. Sankoff, J. Appl. Probab., 12 (1975), pp. 306-315], [D. Sankoff and J. Kruskall, eds., Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA, 1983].
引用
收藏
页码:1122 / 1139
页数:18
相关论文
共 30 条
[1]  
Aho A. V., 1983, DATA STRUCTURES ALGO, V1st
[2]  
ALEXANDER K, 1992, RATE CONVERGENCE MEA
[3]  
ARORA S, 1992, AN S FDN CO, P14
[4]  
BLUM A, IN PRESS J ASS COMPU
[5]  
BLUM A, 1991, 23RD P ACM S THEOR C, P328
[6]  
CHVATAL V, 1975, J APPL PROBAB, V12
[7]  
DAYHOFF MO, 1969, SCI AM, V221, P87
[8]   THEORY AND ALGORITHMS FOR PLAN MERGING [J].
FOULSER, DE ;
LI, M ;
YANG, Q .
ARTIFICIAL INTELLIGENCE, 1992, 57 (2-3) :143-181
[9]  
FOULSER DE, 1986, THESIS STANFORD U
[10]  
HAYES CC, 1989, 11TH P INT JOINT C A, P949