An empirical study of predicting software faults with case-based reasoning

被引:49
作者
Khoshgoftaar, Taghi M. [1 ]
Seliya, Naeem [1 ]
Sundaresh, Nandini [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Empir Software Engn Lab, Boca Raton, FL 33431 USA
基金
美国国家航空航天局;
关键词
software quality; case-based reasoning; software fault prediction; similarity functions; solution algorithm; software metrics;
D O I
10.1007/s11219-006-7597-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression.
引用
收藏
页码:85 / 111
页数:27
相关论文
共 39 条
[1]  
AHA DW, 1994, WORKSH CAS BAS REAS
[2]  
[Anonymous], P 1 INT C CAS BAS RE
[3]  
[Anonymous], J SYST SOFTWARE
[4]  
BELL B, 1994, PROCEEDINGS OF THE SIXTEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, P33
[5]  
Berenson M.L., 1983, Intermediate Statistical Methods and Applications: A Computer Package Approach, V2nd
[6]  
Briand L. C., 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium, P377, DOI 10.1109/ICSE.2000.870428
[7]  
Dillon W.R., 1984, MULTIVARIATE ANAL ME
[8]   Data mining and knowledge discovery: Making sense out of data [J].
Fayyad, UM .
IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1996, 11 (05) :20-25
[9]  
Fenton N., 1997, SOFTWARE METRICS RIG
[10]   Case-based software quality prediction [J].
Ganesan, K ;
Khoshgoftaar, TM ;
Allen, EB .
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2000, 10 (02) :139-152