使用少热元学习的跨域深代码搜索 (Cross-Domain Deep Code Search with Few-Shot Meta Learning) - 专知论文

会员服务 ·

0

CodeBERT · MoDELS · Learning · 小样本学习 · 代码 ·

2022 年 12 月 3 日

Cross-Domain Deep Code Search with Few-Shot Meta Learning

翻译：使用少热元学习的跨域深代码搜索

Yitian Chai,Hongyu Zhang,Beijun Shen,Xiaodong Gu

from arxiv, Accepted by ICSE 2022 (The 44th International Conference on Software Engineering)

Recently, pre-trained programming language models such as CodeBERT have demonstrated substantial gains in code search. Despite showing great performance, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper, we propose CDCS, a novel approach for domain-specific code search. CDCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as SQL and Solidity. Unlike cross-language CodeBERT, which is directly fine-tuned in the target language, CDCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely, SQL and Solidity, with model transferred from two widely used languages (Python and Java). Experimental results show that CDCS significantly outperforms conventional pre-trained code models that are directly fine-tuned in domain-specific languages, and it is particularly effective for scarce data.

翻译：最近,CodBERT等经过事先培训的编程语言模型在代码搜索方面取得了长足的进展。尽管表现良好,但它们依靠大量平行数据来微调查询和代码之间的语义绘图。这限制了其在特定领域语言中的实际实用性,其数据相对稀缺且费用昂贵。在本文件中,我们建议CDCS,这是对特定领域代码搜索的一种新颖方法。CDCS使用一个传输学习框架,初步方案代表模式在大量通用语言(如Java和Python)上预先培训,并进一步适应SQL和Solidicity等特定领域的语言。与直接调整目标语言的跨语言代码代码BERT不同,CDCS调整了几张通用的元学习算法,称为MAML,以学习模型参数的良好初始化,这种参数最好在特定领域语言中重新使用。我们评估了两种特定领域语言(即SQL和Solidicity)的拟议方法,从两种广泛使用的语言(Python和Java)中转移的模式。实验结果显示,CDCS在直接调整常规代码之前,这种模型是直接调整的。

0

相关内容

CodeBERT

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

专知会员服务

9+阅读 · 2022年3月8日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

157+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

93+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

NES1基因联合188Re内放射治疗前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

各向同性和TI弹性波方程高精度有限差分数值解法新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于转录组测序筛选大麦籽粒黄酮醇及其衍生物代谢酶基因

国家自然科学基金

0+阅读 · 2014年12月31日

雄黄微生物转化液RTS诱导白血病细胞及多药耐药白血病细胞凋亡和细胞自噬性死亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

雄激素非依赖性前列腺癌细胞系功能性代谢物的鉴定及其信号通路研究

国家自然科学基金

0+阅读 · 2012年12月31日

转录因子Slug体内调控前列腺癌生长的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GH/IGF-1轴糖尿病肾病大鼠Snail 1通路及TEMT的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于代谢组学研究中药注射剂过敏反应及类过敏反应

国家自然科学基金

0+阅读 · 2011年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

Efficient Domain Adaptation for Speech Foundation Models

Arxiv

0+阅读 · 2023年2月3日

Few-shot Learning with Noisy Labels

Arxiv

12+阅读 · 2022年4月12日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Cross-Domain Few-Shot Graph Classification

Arxiv

13+阅读 · 2022年1月20日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

13+阅读 · 2021年7月20日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Few-shot Learning with Meta Metric Learners

Arxiv

13+阅读 · 2019年1月26日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

16+阅读 · 2018年12月10日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

13+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

小样本学习

相关VIP内容

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

专知会员服务

9+阅读 · 2022年3月8日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

157+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

93+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

相关论文

Efficient Domain Adaptation for Speech Foundation Models

Arxiv

0+阅读 · 2023年2月3日

Few-shot Learning with Noisy Labels

Arxiv

12+阅读 · 2022年4月12日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Cross-Domain Few-Shot Graph Classification

Arxiv

13+阅读 · 2022年1月20日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

13+阅读 · 2021年7月20日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Few-shot Learning with Meta Metric Learners

Arxiv

13+阅读 · 2019年1月26日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

16+阅读 · 2018年12月10日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

13+阅读 · 2018年4月6日

相关基金

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

NES1基因联合188Re内放射治疗前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

各向同性和TI弹性波方程高精度有限差分数值解法新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于转录组测序筛选大麦籽粒黄酮醇及其衍生物代谢酶基因

国家自然科学基金

0+阅读 · 2014年12月31日

雄黄微生物转化液RTS诱导白血病细胞及多药耐药白血病细胞凋亡和细胞自噬性死亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

雄激素非依赖性前列腺癌细胞系功能性代谢物的鉴定及其信号通路研究

国家自然科学基金

0+阅读 · 2012年12月31日

转录因子Slug体内调控前列腺癌生长的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GH/IGF-1轴糖尿病肾病大鼠Snail 1通路及TEMT的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于代谢组学研究中药注射剂过敏反应及类过敏反应

国家自然科学基金

0+阅读 · 2011年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员