Comparative assessment of software quality classification techniques: An empirical case study

被引:113
作者
Khoshgoftaar, TM [1 ]
Seliya, N [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Empir Software Engn Lab, Boca Raton, FL 33431 USA
基金
美国国家航空航天局;
关键词
software quality classification; decision trees; case-based reasoning; logistic regression; expected cost of misclassification; analysis of variance;
D O I
10.1023/B:EMSE.0000027781.18360.9b
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software metrics-based quality classification models predict a software module as either faultprone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of inisclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems.
引用
收藏
页码:229 / 257
页数:29
相关论文
共 39 条
[1]   A validation of object-oriented design metrics as quality indicators [J].
Basili, VR ;
Briand, LC ;
Melo, WL .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (10) :751-761
[2]  
Beaumont GP, 1996, STAT TESTS INTRO MIN
[3]  
Berenson M.L., 1983, Intermediate Statistical Methods and Applications: A Computer Package Approach, V2nd
[4]  
Breiman L., 1998, CLASSIFICATION REGRE
[5]   Assessing the applicability of fault-proneness models across object-oriented software projects [J].
Briand, LC ;
Melo, WL ;
Wüst, J .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :706-720
[6]   DEVELOPING INTERPRETABLE MODELS WITH OPTIMIZED SET REDUCTION FOR IDENTIFYING HIGH-RISK SOFTWARE COMPONENTS [J].
BRIAND, LC ;
BASILI, VR ;
HETMANSKI, CJ .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1993, 19 (11) :1028-1044
[7]  
Clark L.A., 1992, STAT MODELS S, P377
[8]   Classification techniques for metric-based software development [J].
Ebert, C .
SOFTWARE QUALITY JOURNAL, 1996, 5 (04) :255-272
[9]   Data mining and knowledge discovery: Making sense out of data [J].
Fayyad, UM .
IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1996, 11 (05) :20-25
[10]  
JOHNSON RA, 1992, APPL MULRIVARITAE ST