Code Completion with Statistical Language Models

被引:74
作者
Raychev, Veselin [1 ]
Vechev, Martin [1 ]
Yahav, Eran [2 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Technion, Haifa, Israel
关键词
D O I
10.1145/2666356.2594321
中图分类号
TP31 [计算机软件];
学科分类号
081205 [计算机软件];
摘要
We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls. Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments. Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.
引用
收藏
页码:419 / 428
页数:10
相关论文
共 37 条
[1]
Alnusair A., 2010, 2010 IEEE International Conference on Information Reuse & Integration (IRI 2010), P7, DOI 10.1109/IRI.2010.5558972
[2]
Ammons Glenn, 2002, POPL 02
[3]
[Anonymous], 1987, IEEE Transactions on Acoustics, Speech and Signal
[4]
BECKMAN N, ECOOP 11
[5]
A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[6]
Cook J. E., 1998, ACM Transactions on Software Engineering and Methodology, V7, P215, DOI 10.1145/287000.287001
[7]
Dagenais Barthelemy, OOPSLA 08, P313
[8]
FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[9]
Gulwani Sumit, 2010, S PRINC PRACT DECL P
[10]
GVERO T., 2011, LNCS, V6806