An empirical study of predicting software faults with case-based reasoning

被引:49
作者
Khoshgoftaar, Taghi M. [1 ]
Seliya, Naeem [1 ]
Sundaresh, Nandini [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Empir Software Engn Lab, Boca Raton, FL 33431 USA
基金
美国国家航空航天局;
关键词
software quality; case-based reasoning; software fault prediction; similarity functions; solution algorithm; software metrics;
D O I
10.1007/s11219-006-7597-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression.
引用
收藏
页码:85 / 111
页数:27
相关论文
共 39 条
[31]   Body of knowledge for software quality measurement [J].
Schneidewind, NF .
COMPUTER, 2002, 35 (02) :77-+
[32]   Estimating software project effort using analogies [J].
Shepperd, M ;
Schofield, C .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1997, 23 (11) :736-743
[33]   Comparing software prediction techniques using simulation [J].
Shepperd, M ;
Kadoda, G .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2001, 27 (11) :1014-1022
[34]  
SMITH NT, 1995, P 4 SOFTW ENG RES FO, P193
[35]  
SUNDARESH N, 2001, THESIS FLORIDA ATLAN
[36]  
Troster J., 1995, Annals of Software Engineering, V1, P95, DOI 10.1007/BF02249047
[37]  
VOTTA LG, 1995, PROC INT CONF SOFTW, P277, DOI 10.1145/225014.225040
[38]  
WHITTEN IH, 2000, DATA MINING PRACTICA
[39]  
Wohlin C., 2000, EXPT SOFTWARE ENG IN, DOI DOI 10.1007/978-1-4615-4625-2