Shared information and program plagiarism detection

被引:130
作者
Chen, X [1 ]
Francia, B
Li, M
McKinnon, B
Seker, A
机构
[1] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 95202 USA
[2] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
[3] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
关键词
Kolmogorov complexity; program plagiarism detection; shared information;
D O I
10.1109/TIT.2004.830793
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A fundamental question in information theory and in computer science is how to measure similarity or the amount of shared information between two sequences. We have proposed a metric, based on Kolmogorov complexity, to answer this question and have proven it to be universal. We apply this metric in measuring the amount of shared information between two computer programs, to enable plagiarism detection. We have designed and implemented a practical system SID (Software Integrity Diagnosis system) that approximates this metric by a heuristic compression algorithm. Experimental results demonstrate that SID has clear advantages over other plagiarism detection systems. SID system server is online at http://software.bioinformatics.uwaterloo.ca/SID/.
引用
收藏
页码:1545 / 1551
页数:7
相关论文
共 25 条
[1]  
AIKEN A, MEASURE SOFTWARE SIM
[2]  
[Anonymous], 1996, P ROY SOC A-MATH PHY, V452, P769, DOI DOI 10.1098/rspa.1996.0039
[3]   Language trees and zipping [J].
Benedetto, D ;
Caglioti, E ;
Loreto, V .
PHYSICAL REVIEW LETTERS, 2002, 88 (04) :4
[4]  
BENNETT C, 2003, SCI AM JUN, P71
[5]   Information distance [J].
Bennett, CH ;
Gacs, P ;
Li, M ;
Vitanyi, FMB ;
Zurek, WH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (04) :1407-1423
[6]  
CHEN X, 1999, P 10 WORKSH GEN INF, P52
[7]  
CILIBRASI R, 2003, ALGORITHMIC CLUSTERI
[8]   Sim: A utility for detecting similarity in computer programs [J].
Gitchell, D ;
Tran, N .
PROCEEDINGS OF THE THIRTIETH SIGCSE TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, 1999, :266-270
[9]  
Li M, 2003, SIAM PROC S, P863
[10]   An information-based sequence distance and its application to whole mitochondrial genome phylogeny [J].
Li, M ;
Badger, JH ;
Chen, X ;
Kwong, S ;
Kearney, P ;
Zhang, HY .
BIOINFORMATICS, 2001, 17 (02) :149-154