The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs

被引:213
作者
Le Goues, Claire [1 ]
Holtschulte, Neal [2 ]
Smith, Edward K. [3 ]
Brun, Yuriy [3 ]
Devanbu, Premkumar [4 ]
Forrest, Stephanie [2 ]
Weimer, Westley [5 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
[3] Univ Massachusetts, Coll Informat & Comp Sci, Amherst, MA 01003 USA
[4] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
[5] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22904 USA
基金
美国国家科学基金会;
关键词
Automated program repair; benchmark; subject defect; reproducibility; MANYBUGS; INTROCLASS;
D O I
10.1109/TSE.2015.2454513
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducible, important defects and a deterministic method for assessing if those defects have been repaired. This article details the need for a new set of benchmarks, outlines requirements, and then presents two datasets, MANYBUGS and INTROCLASS, consisting between them of 1,183 defects in 15 C programs. Each dataset is designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions. The datasets have empirically defined guarantees of reproducibility and benchmark quality, and each study object is categorized to facilitate qualitative evaluation and comparisons by category of bug or program. The article presents baseline experimental results on both datasets for three existing repair methods, GenProg, AE, and TrpAutoRepair, to reduce the burden on researchers who adopt these datasets for their own comparative evaluations.
引用
收藏
页码:1236 / 1256
页数:21
相关论文
共 69 条
[11]  
Anvik J., 2006, P 28 INT C SOFTW ENG, P361, DOI [DOI 10.1145/1134285.1134336, 10.1145/1134285.1134336]
[12]  
Arcuri A, 2011, LECT NOTES COMPUT SC, V6956, P33, DOI 10.1007/978-3-642-23716-4_6
[13]  
Arcuri A, 2008, ICSE'08 PROCEEDINGS OF THE THIRTIETH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, P1003
[14]   A Novel Co-evolutionary Approach to Automatic Software Bug Fixing [J].
Arcuri, Andrea ;
Yao, Xin .
2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, :162-168
[15]   The PARSEC Benchmark Suite: Characterization and Architectural Implications [J].
Bienia, Christian ;
Kumar, Sanjeev ;
Singh, Jaswinder Pal ;
Li, Kai .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :72-81
[16]   Fair and Balanced? Bias in Bug-Fix Datasets [J].
Bird, Christian ;
Bachmann, Adrian ;
Aune, Eirik ;
Duffy, John ;
Bernstein, Abraham ;
Filkov, Vladimir ;
Devanbu, Premkumar .
7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, :121-130
[17]   The DaCapo benchmarks: Java']Java benchmarking development and analysis [J].
Blackburn, Stephen M. ;
Garner, Robin ;
Hoffmann, Chris ;
Khan, Asjad M. ;
McKinley, Kathryn S. ;
Bentzur, Rotem ;
Diwan, Amer ;
Feinberg, Daniel ;
Frampton, Daniel ;
Guyer, Samuel Z. ;
Hirzel, Martin ;
Hosking, Antony ;
Jump, Maria ;
Lee, Han ;
Moss, J. Eliot B. ;
Phansalkar, Aashish ;
Stefanovic, Darko ;
VanDrunen, Thomas ;
von Dincklage, Daniel ;
Wiedermann, Ben .
ACM SIGPLAN NOTICES, 2006, 41 (10) :169-190
[18]  
Bradbury J.S., 2010, International Symposium on Search Based Software Engineering - Fast Abstracts, P1
[19]  
Britton Tom, 2013, REVERSIBLE DEBUGGING
[20]  
Cadar C., 2008, Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI'08, (USA), P209