Clone detection using abstract syntax trees

被引:614
作者
Baxter, ID [1 ]
Yahin, A [1 ]
Moura, L [1 ]
Sant'Anna, M [1 ]
Bier, L [1 ]
机构
[1] Semant Designs, Austin, TX 78759 USA
来源
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS | 1998年
关键词
software maintenance; clone detection; software evaluation; design maintenance system;
D O I
10.1109/ICSM.1998.738528
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing research suggests that a considerable fraction (5-10%) of the source code of Large-scale computer programs is duplicate code ("clones"). Detection and removal of such clones promises decreased software maintenance costs of possibly the same magnitude. Previous work was limited to detection of either near-misses differing only in single lexems, or near misses only between complete functions. This paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees. Previous work also did not suggest practical means for removing detected clones. Since our methods operate in terms of the program structure, clones could be removed by mechanical methods producing in-lined procedures or standard preprocessor macros. A tool using these techniques is applied to a C production software system of some 400K source lines, and the results confirm detected levels of duplication found by previous work. The tool produces macro bodies needed for clone removal, and macro invocations to replace the clones. The tool uses a variation of the well-known compiler method for detecting common sub-expressions. This method determines exact tree matches; a number of adjustments are needed to detect equivalent statement sequences. commutative operands, and nearly exact matches. We additionally suggest that clone detection could also be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.
引用
收藏
页码:368 / 377
页数:10
相关论文
共 11 条
[1]  
Aho Alfred V., 2007, COMPILERS PRINCIPLES
[2]  
BAKER B, WORK C REV ENG 1995
[3]  
BARSON P, 1995, P INT WORKSH APPL NE, V2
[4]  
BAXTER I, 1997, INT C SOFTW MAINT IE
[5]  
DEBAUD JM, 1997, WORK C REV ENG IEEE
[6]  
JOHNSON H, 1996, P CASCON 96
[7]  
Johnson J. H., 1994, P INT C SOFTW MAINT
[8]  
LAGUE B, 1997, INT C SOFTW MAINT IE
[9]  
Tomita Masaru, 1991, Generalized LR Parsing
[10]  
WAGNER T, 1997, P 1997 SIGPLAN C PRO