Analogy-based practical classification rules for software quality estimation

被引：70

作者：

Khoshgoftaar, TM ^{[1
]}

Seliya, N ^{[1
]}

机构：

[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Empir Software Engn Lab, Boca Raton, FL 33431 USA

来源：

EMPIRICAL SOFTWARE ENGINEERING | 2003年 / 8卷 / 04期

基金：

美国国家航空航天局;

关键词：

software reliability estimation; case-based reasoning; classification models; data clustering; majority voting; software metrics; multiple releases;

D O I：

10.1023/A:1025316301168

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Software metrics-based quality estimation models can be effective tools for identifying which modules are likely to be fault-prone or not fault-prone. The use of such models prior to system deployment can considerably reduce the likelihood of faults discovered during operations, hence improving system reliability. A software quality classification model is calibrated using metrics from a past release or similar project, and is then applied to modules currently under development. Subsequently, a timely prediction of which modules are likely to have faults can be obtained. However, software quality classification models used in practice may not provide a useful balance between the two misclassification rates, especially when there are very few faulty modules in the system being modeled. This paper presents, in the context of case-based reasoning, two practical classification rules that allow appropriate emphasis on each type of misclassification as per the project requirements. The suggested techniques are especially useful for high-assurance systems where faulty modules are rare. The proposed generalized classification methods emphasize on the costs of misclassifications, and the unbalanced distribution of the faulty program modules. We illustrate the proposed techniques with a case study that consists of software measurements and fault data collected over multiple releases of a large-scale legacy telecommunication system. In addition to investigating the two classification methods, a brief relative comparison of the techniques is also presented. It is indicated that the level of classification accuracy and model-robustness observed for the case study would be beneficial in achieving high software reliability of its subsequent system releases. Similar observations are made from our empirical studies with other case studies.

引用

页码：325 / 350

页数：26

共 35 条

[1] A validation of object-oriented design metrics as quality indicators [J].

Basili, VR ;

Briand, LC ;

Melo, WL .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (10) :751-761

[2]

Berenson M.L., 1983, Intermediate Statistical Methods and Applications: A Computer Package Approach, V2nd

[3] Assessing the applicability of fault-proneness models across object-oriented software projects [J].

Briand, LC ;

Melo, WL ;

Wüst, J .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :706-720

[4] DEVELOPING INTERPRETABLE MODELS WITH OPTIMIZED SET REDUCTION FOR IDENTIFYING HIGH-RISK SOFTWARE COMPONENTS [J].

BRIAND, LC ;

BASILI, VR ;

HETMANSKI, CJ .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1993, 19 (11) :1028-1044

[5]

Dillon W.R., 1984, MULTIVARIATE ANAL ME

[6] Classification techniques for metric-based software development [J].

Ebert, C .

SOFTWARE QUALITY JOURNAL, 1996, 5 (04) :255-272

[7] Data mining and knowledge discovery: Making sense out of data [J].

Fayyad, UM .

IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1996, 11 (05) :20-25

[8] Case-based software quality prediction [J].

Ganesan, K ;

Khoshgoftaar, TM ;

Allen, EB .

INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2000, 10 (02) :139-152

[9]

GOKHALE SS, 1997, P 3 ISSAT INT C REL, P31

[10] Software metrics data analysis - exploring the relative performance of some commonly used modeling techniques [J].

Gray A.R. ;

Macdonell S.G. .

Empirical Software Engineering, 1999, 4 (4) :297-316

← 1 2 3 4 →