Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly structured environment with strict syntax rules. Specifically, we propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models. We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46\% over a reasonable sequence-to-sequence baseline. All results and related code used for training and data processing are available on GitHub.
翻译:最近在自然语言处理\cite{gpt2}\cite{BERT}方面取得的进步导致在多种自然语言任务中几乎人性化的表现。 在本文中,我们试图了解是否类似的技术可以适用于结构严密的环境,并有严格的语法规则。 具体地说,我们提议了在预先培训的语言模式上搭建的Python语言代码生成端到端机学习模式。 我们证明微调模式在代码生成任务中表现良好,达到0.22的BLEU分,比合理的顺序到顺序基线改进了46 ⁇ 。 GitHub 提供了用于培训和数据处理的所有结果和相关代码。