Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

被引:111
作者
Kocaguneli, Ekrem [1 ]
Menzies, Tim [1 ]
Bener, Ayse Basar [2 ]
Keung, Jacky W. [3 ]
机构
[1] W Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
[2] Ryerson Univ, Ted Rogers Sch Informat Technol Management, Toronto, ON M5G 2C, Canada
[3] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
Software cost estimation; analogy; k-NN; SOFTWARE EFFORT ESTIMATION; COST ESTIMATION; EMPIRICAL VALIDATION; SELECTION; MODEL;
D O I
10.1109/TSE.2011.27
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation, i.e., the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of supertrees versus smaller subtrees. Results: For 10 data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: 1) The estimation variance of cluster subtrees is usually larger than that of cluster supertrees; 2) if analogy is restricted to the cluster trees with lower variance, then effort estimates have a significantly lower error (measured using MRE, AR, and Pred(25) with a Wilcoxon test, 95 percent confidence, compared to nearest neighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.
引用
收藏
页码:425 / 438
页数:14
相关论文
共 69 条
[1]   A simulation tool for efficient analogy based cost estimation [J].
Angelis L. ;
Stamelos I. .
Empirical Software Engineering, 2000, 5 (1) :35-68
[2]  
[Anonymous], 2004, Introduction to Machine Learning
[3]  
[Anonymous], 2007, P 3 INT WORKSH PRED
[4]  
[Anonymous], P 5 INT S EMP SOFTW
[5]  
[Anonymous], THESIS W VIRGINIA U
[6]  
[Anonymous], 1981, Software Engineering Economics
[7]   Optimal project feature weights in analogy-based cost estimation: Improvement and limitations [J].
Auer, M ;
Trendowicz, A ;
Graser, B ;
Haunschmid, E ;
Biffl, S .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2006, 32 (02) :83-92
[8]   A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain [J].
Bakir, Ayse ;
Turhan, Burak ;
Bener, Ayse B. .
SOFTWARE QUALITY JOURNAL, 2010, 18 (01) :57-80
[9]  
Beeferman D., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P407, DOI 10.1145/347090.347176
[10]   Software development cost estimation approaches - A survey [J].
Boehm, B ;
Abts, C ;
Chulani, S .
ANNALS OF SOFTWARE ENGINEERING, 2000, 10 :177-205