Solving the Search for Source Code

被引:89
作者
Stolee, Kathryn T. [1 ]
Elbaum, Sebastian [2 ]
Dobos, Daniel [2 ]
机构
[1] Iowa State Univ, Ames, IA 50011 USA
[2] Univ Nebraska, Lincoln, NE USA
关键词
Verification; Languages; Experimentation; Semantic code search; symbolic analysis; SMT solvers; lightweight specification; SYMBOLIC EXECUTION;
D O I
10.1145/2581377
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Programmers frequently search for source code to reuse using keyword searches. The search effectiveness in facilitating reuse, however, depends on the programmer's ability to specify a query that captures how the desired code may have been implemented. Further, the results often include many irrelevant matches that must be filtered manually. More semantic search approaches could address these limitations, yet existing approaches are either not flexible enough to find approximate matches or require the programmer to define complex specifications as queries. We propose a novel approach to semantic code search that addresses several of these limitations and is designed for queries that can be described using a concrete input/output example. In this approach, programmers write lightweight specifications as inputs and expected output examples. Unlike existing approaches to semantic search, we use an SMT solver to identify programs or program fragments in a repository, which have been automatically transformed into constraints using symbolic analysis, that match the programmer-provided specification. We instantiated and evaluated this approach in subsets of three languages, the Java String library, Yahoo! Pipes mashup language, and SQL select statements, exploring its generality, utility, and trade-offs. The results indicate that this approach is effective at finding relevant code, can be used on its own or to filter results from keyword searches to increase search precision, and is adaptable to find approximate matches and then guide modifications to match the user specifications when exact matches do not already exist. These gains in precision and flexibility come at the cost of performance, for which underlying factors and mitigation strategies are identified.
引用
收藏
页数:45
相关论文
共 50 条
[1]  
[Anonymous], 2011, P 19 ACM SIGSOFT S 1
[2]  
[Anonymous], 2009, P 4 INT C COMMUNITIE, DOI DOI 10.1145/1556460.1556489
[3]  
[Anonymous], 2009, P 18 ACM C INF KNOWL
[4]  
[Anonymous], 2007, P INT C DAT ENG, DOI DOI 10.1109/ICDE.2007.367896
[5]   A 15 YEAR PERSPECTIVE ON AUTOMATIC PROGRAMMING [J].
BALZER, R .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1985, 11 (11) :1257-1268
[6]  
Bjorner N, 2009, LECT NOTES COMPUT SC, V5505, P307, DOI 10.1007/978-3-642-00768-2_27
[7]  
Chan Wing-Kwan, 2012, P 20 ACM SIGSOFT INT
[8]  
Clarke L. A., 1976, IEEE Transactions on Software Engineering, VSE-2, P215, DOI 10.1109/TSE.1976.233817
[9]   APPLICATIONS OF SYMBOLIC EVALUATION [J].
CLARKE, LA ;
RICHARDSON, DJ .
JOURNAL OF SYSTEMS AND SOFTWARE, 1985, 5 (01) :15-35
[10]  
Cottrell Rylan., 2008, P INT S FDN SOFTWARE, P214, DOI [10.1145/1453101.1453130, DOI 10.1145/1453101.1453130]