Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports

被引:141
作者
An Ngoc Lam [1 ]
Anh Tuan Nguyen [1 ]
Hoan Anh Nguyen [1 ]
Nguyen, Tien N. [1 ]
机构
[1] Iowa State Univ, Ames, IA 50011 USA
来源
2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE) | 2015年
基金
美国国家科学基金会;
关键词
D O I
10.1109/ASE.2015.73
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug localization refers to the automated process of locating the potential buggy files for a given bug report. To help developers focus their attention to those files is crucial. Several existing automated approaches for bug localization from a bug report face a key challenge, called lexical mismatch, in which the terms used in bug reports to describe a bug are different from the terms and code tokens used in source files. This paper presents a novel approach that uses deep neural network (DNN) in combination with rVSM, an information retrieval (IR) technique. rVSM collects the feature on the textual similarity between bug reports and source files. DNN is used to learn to relate the terms in bug reports to potentially different code tokens and terms in source files and documentation if they appear frequently enough in the pairs of reports and buggy files. Our empirical evaluation on real-world projects shows that DNN and IR complement well to each other to achieve higher bug localization accuracy than individual models. Importantly, our new model, HyLoc, with a combination of the features built from DNN, rVSM, and project's bug-fixing history, achieves higher accuracy than the state-of-the-art IR and machine learning techniques. In half of the cases, it is correct with just a single suggested file. Two out of three cases, a correct buggy file is in the list of three suggested files.
引用
收藏
页码:476 / 481
页数:6
相关论文
共 10 条
[1]  
Anh Tuan Nguyen, 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering, P263, DOI 10.1109/ASE.2011.6100062
[2]  
[Anonymous], 2009, FDN TRENDS MACHINE L
[3]  
Arisoy E, 2012, P NAACL HLT 2012 WOR, P20
[4]  
Asuncion H.U., 2010, P 32 INT C SOFTW ENG, P95
[5]  
Jones J. A., ASE 05, P273
[6]   Where Should We Fix This Bug? A Two-Phase Recommendation Model [J].
Kim, Dongsun ;
Tao, Yida ;
Kim, Sunghun ;
Zeller, Andreas .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2013, 39 (11) :1597-1610
[7]   Bug localization using latent Dirichlet allocation [J].
Lukins, Stacy K. ;
Kraft, Nicholas A. ;
Etzkorn, Letha H. .
INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) :972-990
[8]   Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval [J].
Poshyvanyk, Denys ;
Gueheneuc, Yann-Gael ;
Marcus, Andrian ;
Antoniol, Giuliano ;
Rajlich, Vaclav .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (06) :420-432
[9]   Learning to Rank Relevant Files for Bug Reports using Domain Knowledge [J].
Ye, Xin ;
Bunescu, Razvan ;
Liu, Chang .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :689-699
[10]  
Zhou J, 2012, PROC INT CONF SOFTW, P14, DOI 10.1109/ICSE.2012.6227210