编码:代码填充和合成生成模型 (InCoder: A Generative Model for Code Infilling and Synthesis) - 专知论文

会员服务 ·

0

代码 · Performer · 生成模型 · MoDELS · HTTPS ·

2022 年 4 月 17 日

InCoder: A Generative Model for Code Infilling and Synthesis

翻译：编码:代码填充和合成生成模型

Daniel Fried,Armen Aghajanyan,Jessy Lin,Sida Wang,Eric Wallace,Freda Shi,Ruiqi Zhong,Wen-tau Yih,Luke Zettlemoyer,Mike Lewis

from arxiv, 25 pages, 13 figures. v2: added NeoX-20B results & StackOverflow corpus info

Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. https://sites.google.com/view/incoder-code-models

翻译：代码很少以单左对右通过写成,而是反复编辑和完善。我们引入了 InCoder,这是一个统一的基因模型,可以进行程序合成(通过左对右一代)和编辑(填充)和编辑(通过填充)。 InCoder 受过培训,可以从大量许可使用的代码中生成代码文件,其中代码区域被随机遮盖,并移动到每个文件的末尾,允许代码填充双向环境。我们的模型是第一个能够直接执行零射代码填充的基因模型,我们评估了具有挑战性的任务,例如类型推断、评论生成和变量重命名。我们发现,以双向环境为条件的能力大大改进了这些任务的业绩,同时仍然在标准程序合成基准上与在类似规模上预先训练的仅左向右模式相对。 InCoder 模型和代码被公开发布。 https://sites.google.com/view/incoder-coded-models。

0

相关内容

代码（Code）是专知网的一个重要知识资料文档板块，旨在整理收录论文源代码、复现代码，经典工程代码等，便于用户查阅下载使用。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

KCuxSy热电材料声子和载流子传输特性的协同调控

国家自然科学基金

0+阅读 · 2015年12月31日

COL6A1在去势抵抗性前列腺癌肿瘤微环境中的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

小麦粒重主效QTL的精细定位及候选基因克隆和功能鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

西瓜果皮色泽相关基因的筛选、分离及克隆

国家自然科学基金

0+阅读 · 2012年12月31日

苹果转录因子MdTGA2.1的抗病功能及对PR基因的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

木霉诱导下杨树ARF转录因子对其生长及抗病的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘绿霉病菌对DMI杀菌剂抗性的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

不结球白菜BcWRKY1转录因子抗霜霉病功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

多敏感属性微数据发布隐私保护关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

小麦旗叶衰老相关基因的克隆及功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

Drawing out of Distribution with Neuro-Symbolic Generative Models

Arxiv

1+阅读 · 2022年6月3日

Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code

Arxiv

0+阅读 · 2022年6月2日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

A Structured Self-attentive Sentence Embedding

Arxiv

24+阅读 · 2017年3月9日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Drawing out of Distribution with Neuro-Symbolic Generative Models

Arxiv

1+阅读 · 2022年6月3日

Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code

Arxiv

0+阅读 · 2022年6月2日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

A Structured Self-attentive Sentence Embedding

Arxiv

24+阅读 · 2017年3月9日

相关基金

KCuxSy热电材料声子和载流子传输特性的协同调控

国家自然科学基金

0+阅读 · 2015年12月31日

COL6A1在去势抵抗性前列腺癌肿瘤微环境中的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

小麦粒重主效QTL的精细定位及候选基因克隆和功能鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

西瓜果皮色泽相关基因的筛选、分离及克隆

国家自然科学基金

0+阅读 · 2012年12月31日

苹果转录因子MdTGA2.1的抗病功能及对PR基因的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

木霉诱导下杨树ARF转录因子对其生长及抗病的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘绿霉病菌对DMI杀菌剂抗性的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

不结球白菜BcWRKY1转录因子抗霜霉病功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

多敏感属性微数据发布隐私保护关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

小麦旗叶衰老相关基因的克隆及功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员