Balancing misclassification rates in classification-tree models of software quality

被引:59
作者
Khoshgoftaar T.M. [1 ]
Yuan X. [1 ]
Allen E.B. [1 ,2 ]
机构
[1] Florida Atlantic University, Boca Raton, FL
[2] Brown University, Providence, RI
关键词
classification trees; CHAID; TREEDISC; telecommunications; software quality; fault-prone modules; software metrics; knowledge discovery in data bases;
D O I
10.1023/A:1009896203228
中图分类号
学科分类号
摘要
Software product and process metrics can be useful predictors of which modules are likely to have faults during operations. Developers and managers can use such predictions by software quality models to focus enhancement efforts before release. However, in practice, software quality modeling methods in the literature may not produce a useful balance between the two kinds of misclassification rates, especially when there are few faulty modules. This paper presents a practical classification rule in the context of classification tree models that allows appropriate emphasis on each type of misclassification according to the needs of the project. This is especially important when the faulty modules are rare. An industrial case study using classification trees, illustrates the tradeoffs. The trees were built using the TREEDISC algorithm which is a refinement of the CHAID algorithm. We examined two releases of a very large telecommunications system, and built models suited to two points in the development life cycle: the end of coding and the end of beta testing. Both trees had only five significant predictors, out of 28 and 42 candidates, respectively. We interpreted the structure of the classification trees, and we found the models had useful accuracy.
引用
收藏
页码:313 / 330
页数:17
相关论文
共 41 条
[1]
Basili V.R., Briand L.C., Melo W., A Validation of Object-Oriented Design Metrics as Quality Indicators, IEEE Transactions on Software Engineering, 22, 10, pp. 751-761, (1996)
[2]
Briand L.C., Basili V.R., Hetmanski C.J., Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components, IEEE Transactions on Software Engineering, 19, 11, pp. 1028-1044, (1993)
[3]
Ebert C., Classification Techniques for Metric-Based Software Development, Software Quality Journal, 5, 4, pp. 255-272, (1996)
[4]
Ebert C., Experiences with Criticality Predictions in Software Development, Software Engineering -ESEC/FSE '97: Proceedings of the Sixth European Software Engineering Conference and the Fifth ACM SIGSOFT Symposium on the Foundations of Software Engineering, 1301, pp. 278-293, (1997)
[5]
ACM SIGSOFT Software Engineering Notes, 22, 6, (1997)
[6]
Fayyad U.M., Data Mining and Knowledge Discovery: Making Sense out of Data, IEEE Expert, 11, 4, pp. 20-25, (1996)
[7]
Gokhale S.S., Lyu M.R., Regression Tree Modeling for the Prediction of Software Quality, Proceedings of the Third ISSAT International Conference on Reliability and Quality in Design, pp. 31-36, (1997)
[8]
Hand D.J., Data Mining: Statistics and More?, The American Statistician, 52, 2, pp. 112-118, (1998)
[9]
Hawkins D.M., Kass G.V., Automatic Interaction Detection, Topics in Applied Multivariate Analysis, pp. 269-302, (1982)
[10]
Henry S., Wake S., Predicting Maintainability with Software Quality Metrics, Journal of Software Maintenance: Research and Practice, 3, pp. 129-143, (1991)