代码2seq: 从代码结构代表生成序列 (code2seq: Generating Sequences from Structured Representations of Code)

The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We present ${\rm {\scriptsize CODE2SEQ}}$: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of compositional paths in its abstract syntax tree (AST) and uses attention to select the relevant paths while decoding. We demonstrate the effectiveness of our approach for two tasks, two programming languages, and four datasets of up to $16$M examples. Our model significantly outperforms previous models that were specifically designed for programming languages, as well as state-of-the-art NMT models. An interactive online demo of our model is available at http://code2seq.org. Our code, data and trained models are available at http://github.com/tech-srl/code2seq.

翻译：从源代码片断生成自然语言序列的能力有多种应用,如代码总和、文档和检索等。从神经机器翻译(NMT)中采用的序列到序列(seq2seq)模型,通过将源代码作为象征序列处理,在这些任务上取得了最先进的表现。我们展示了$@rm=Scruitsize CODE2SE ⁇ $:一种利用编程语言合成结构更好地编码源代码的替代方法。我们的模型代表一个代码片段,作为抽象合成图树(AST)中的成份路径集,在解码过程中,利用注意力选择相关路径。我们展示了我们的方法在两种任务(两种程序语言)和四个高达16美元的数据集上的有效性。我们的模型大大超越了以前专门设计用于编程语言的模型,以及NMTMT模式的状态。我们的模型互动在线演示可在http://code2seq.org上查阅。我们的代码、数据和经过培训的模型可在 http://codecodes2-rcodeal/codegred. httpgreabs/teqrcode.

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《生成式对抗网络》简介，25页ppt

专知会员服务

176+阅读 · 2020年6月28日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日