A Large-Scale Empirical Study of Just-in-Time Quality Assurance

被引:484
作者
Kamei, Yasutaka [1 ,2 ]
Shihab, Emad [3 ]
Adams, Bram [4 ]
Hassan, Ahmed E. [5 ]
Mockus, Audris [6 ]
Sinha, Anand [7 ]
Ubayashi, Naoyasu [1 ,2 ]
机构
[1] Kyushu Univ, Grad Sch, Nishi Ku, Fukuoka 8190395, Japan
[2] Kyushu Univ, Fac Informat Sci & Elect Engn, Nishi Ku, Fukuoka 8190395, Japan
[3] Rochester Inst Technol, Dept Software Engn, Rochester, NY 14623 USA
[4] Ecole Polytech, Dept Genie Informat & Genie Logiciel, Montreal, PQ H3C 3A7, Canada
[5] Queens Univ, Sch Comp, Kingston, ON K7L 3N6, Canada
[6] Avaya Labs Res, Basking Ridge, NJ 07920 USA
[7] Mabels Labels, Dundas, ON L9H 3R2, Canada
基金
日本学术振兴会;
关键词
Maintenance; software metrics; mining software repositories; defect prediction; just-in-time prediction; OPEN SOURCE SOFTWARE; CODE CHURN; METRICS; CLASSIFICATION; PREDICTION; VALIDATION; MODELS; RISK;
D O I
10.1109/TSE.2012.70
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Defect prediction models are a well-known technique for identifying defect-prone files or packages such that practitioners can allocate their quality assurance efforts (e. g., testing and code reviews). However, once the critical files or packages have been identified, developers still need to spend considerable time drilling down to the functions or even code snippets that should be reviewed or tested. This makes the approach too time consuming and impractical for large software systems. Instead, we consider defect prediction models that focus on identifying defect-prone ("risky") software changes instead of files or packages. We refer to this type of quality assurance activity as "Just-In-Time Quality Assurance," because developers can review and test these risky changes while they are still fresh in their minds (i.e., at check-in time). To build a change risk model, we use a wide range of factors based on the characteristics of a software change, such as the number of added lines, and developer experience. A large-scale study of six open source and five commercial projects from multiple domains shows that our models can predict whether or not a change will lead to a defect with an average accuracy of 68 percent and an average recall of 64 percent. Furthermore, when considering the effort needed to review changes, we find that using only 20 percent of the effort it would take to inspect all changes, we can identify 35 percent of all defect-inducing changes. Our findings indicate that "Just-In-Time Quality Assurance" may provide an effort-reducing way to focus on the most risky changes and thus reduce the costs of developing high-quality software.
引用
收藏
页码:757 / 773
页数:17
相关论文
共 63 条
[1]  
[Anonymous], P 24 INT C SOFTW ENG
[2]  
[Anonymous], 1976, ICSE 76
[3]  
[Anonymous], P 32 ACM IEEE INT C
[4]  
[Anonymous], P INT C PRED MOD SOF
[5]  
[Anonymous], P INT C SOFTW ENG
[6]  
[Anonymous], AUTOMAT SOFTW ENG
[7]  
[Anonymous], 2008, P INT C SOFTW ENG, DOI DOI 10.1145/1370788.1370793
[8]  
[Anonymous], P INT C PRED MOD SOF
[9]  
Arisholm E., 2006, ISESE 06 P 2006 ACMI, P8
[10]   A systematic and comprehensive investigation of methods to build and evaluate fault prediction models [J].
Arisholm, Erik ;
Briand, Lionel C. ;
Johannessen, Eivind B. .
JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (01) :2-17