Recovering traceability links between code and documentation

被引:518
作者
Antoniol, G
Canfora, G
Casazza, G
De Lucia, A
Merlo, E
机构
[1] Univ Sannio, Res Ctr Software Technol, Dept Engn, I-82100 Benevento, Italy
[2] Univ Naples Federico II, Dept Informat & Sistemist, I-80125 Naples, Italy
[3] Ecole Politech, Dept Elect & Comp Engn, Montreal, PQ, Canada
关键词
redocumentation; traceability; program comprehension; object orientation; information retrieval;
D O I
10.1109/TSE.2002.1041053
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. We propose a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work is that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. We believe that the application-domain knowledge that programmers process when writing the code is often captured by the mnemonics for identifiers; therefore, the analysis of these mnemonics can help to associate high-level concepts with program concepts and vice-versa. We apply both a probabilistic and a vector space information retrieval model in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. We compare the results of applying the two models, discuss the benefits and limitations, and describe directions for improvements.
引用
收藏
页码:970 / 983
页数:14
相关论文
共 54 条
[1]   AUTOMATIC SPELLING CORRECTION USING A TRIGRAM SIMILARITY MEASURE [J].
ANGELL, RC ;
FREUND, GE ;
WILLETT, P .
INFORMATION PROCESSING & MANAGEMENT, 1983, 19 (04) :255-261
[2]  
[Anonymous], [No title captured]
[3]  
Antonelli G., 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289), P136, DOI 10.1109/IROS.1999.812994
[4]  
Antoniol G, 2000, PROC IEEE INT CONF S, P40, DOI 10.1109/ICSM.2000.883003
[5]   Design-code traceability for object-oriented systems [J].
Antoniol, G ;
Caprile, B ;
Potrich, A ;
Tonella, P .
ANNALS OF SOFTWARE ENGINEERING, 2000, 9 (1-4) :35-58
[6]   Traceability recovery by modeling programmer behavior [J].
Antoniol, G ;
Casazza, G ;
Cimitile, A .
SEVENTH WORKING CONFERENCE ON REVERSE ENGINEERING - PROCEEDINGS, 2000, :240-247
[7]  
Arnold R. S., 1993, Proceedings. Conference on Software Maintenance 1993. CSM-93 (Cat. No.93CH3360-5), P292, DOI 10.1109/ICSM.1993.366933
[8]  
Bain L.J., 1992, Introduction to probability and mathematical statistics, VVolume 4
[9]  
BIGGERSTAFF TJ, 1993, PROC INT CONF SOFTW, P482, DOI 10.1109/ICSE.1993.346017
[10]   TOWARDS A THEORY OF THE COMPREHENSION OF COMPUTER-PROGRAMS [J].
BROOKS, R .
INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1983, 18 (06) :543-554