Parameterized duplication in strings: Algorithms and an application to software maintenance

被引:92
作者
Baker, BS
机构
[1] Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974
关键词
string matching; pattern matching; duplication;
D O I
10.1137/S0097539793246707
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As an aid in software maintenance, it would be useful to be able to track down duplication in large software systems efficiently. Duplication in code is often in the form of sections of code that are the same except for a systematic change of parameters such as identifiers and constants. To model such parameterized duplication in code, this paper introduces the notions of parameterized strings and parameterized matches of parameterized strings. A data structure called a parameterized suffix tree is defined to aid in searching for parameterized matches. For fixed alphabets, algorithms are given to construct a parameterized suffix tree in linear time and to find all maximal parameterized matches over a threshold length in a parameterized p-string in time linear in the size of the input plus the number of matches reported. The algorithms have been implemented, and experimental results show that they perform well on C code.
引用
收藏
页码:1343 / 1362
页数:20
相关论文
共 20 条
[1]  
Aho A.V., 1974, The Design and Analysis of Computer Algorithms
[2]  
Aho A.V., 1990, HDB THEORETICAL COMP, P255, DOI [10.1016/B978-0-444-88071-0.50010-2, DOI 10.1016/B978-0-444-88071-0.50010-2]
[3]  
Baker B. S., 1993, Proceedings of the Twenty-Fifth Annual ACM Symposium on the Theory of Computing, P71, DOI 10.1145/167088.167115
[4]  
BAKER BS, 1992, COMPUTING SCIENCE AND STATISTICS : VOL 24, P49
[5]  
BAKER BS, IN PRESS ALGORITHMIC
[6]   SUBLINEAR APPROXIMATE STRING-MATCHING AND BIOLOGICAL APPLICATIONS [J].
CHANG, WI ;
LAWLER, EL .
ALGORITHMICA, 1994, 12 (4-5) :327-344
[7]  
Church K.W., 1993, J COMPUT GRAPH STAT, V2, P153, DOI [10.2307/1390697, DOI 10.2307/1390697, DOI 10.1080/10618600.1993.10474605, 10.1080/10618600.1993, DOI 10.1080/10618600.1993]
[8]  
Galil Z., 1988, Journal of Complexity, V4, P33, DOI 10.1016/0885-064X(88)90008-8
[9]  
Genesereth M.R., 1987, Logical foundations of artificial intelligence, V80, P100
[10]  
GIANCARLO R, 1993, PROCEEDINGS OF THE FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P402