A Survey of Machine Learning for Big Code and Naturalness

被引:560
作者
Allamanis, Miltiadis [1 ]
Barr, Earl T. [2 ]
Devanbu, Premkumar [3 ]
Sutton, Charles [4 ,5 ]
机构
[1] Microsoft Res, 21 Stn Rd, Cambridge CB1 2FB, England
[2] UCL, Dept Comp Sci, Gower St, London WC1E 6BT, England
[3] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
[4] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
[5] Alan Turing Inst, London, England
基金
英国工程与自然科学研究理事会; 新加坡国家研究基金会;
关键词
Big code; code naturalness; software engineering tools; machine learning;
D O I
10.1145/3212695
中图分类号
TP301 [理论、方法];
学科分类号
080201 [机械制造及其自动化];
摘要
Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of eachmodel and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.
引用
收藏
页数:37
相关论文
共 200 条
[1]
Allamanis Miltiadis, 2018, P INT C LEARN REPR I
[2]
Allamanis Miltiadis, 2017, P INT C MACHINE LEAR
[3]
[Anonymous], 2016, CORR
[4]
[Anonymous], P INT C AUT SOFTW EN
[5]
[Anonymous], 2015, ICLR
[6]
[Anonymous], P INT C MACH LEARN I
[7]
[Anonymous], P 2016 ACM INT S NEW
[8]
[Anonymous], P INT S FDN SOFTW EN
[9]
[Anonymous], P WORK C MIN SOFTW R
[10]
[Anonymous], P C ART INT AAAI 17