CrossCodeBench:制定《源码模型》跨任务一般化基准 (CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models) - 专知论文

会员服务 ·

0

泛化理论 · MoDELS · Learning · INFORMS · 代码 ·

2023 年 2 月 8 日

CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models

翻译：CrossCodeBench:制定《源码模型》跨任务一般化基准

Changan Niu,Chuanyi Li,Vincent Ng,Bin Luo

from arxiv, ICSE 2023

Despite the recent advances showing that a model pre-trained on large-scale source code data is able to gain appreciable generalization capability, it still requires a sizeable amount of data on the target task for fine-tuning. And the effectiveness of the model generalization is largely affected by the size and quality of the fine-tuning data, which is detrimental for target tasks with limited or unavailable resources. Therefore, cross-task generalization, with the goal of improving the generalization of the model to unseen tasks that have not been seen before, is of strong research and application value. In this paper, we propose a large-scale benchmark that includes 216 existing code-related tasks. Then, we annotate each task with the corresponding meta information such as task description and instruction, which contains detailed information about the task and a solution guide. This also helps us to easily create a wide variety of ``training/evaluation'' task splits to evaluate the various cross-task generalization capabilities of the model. Then we perform some preliminary experiments to demonstrate that the cross-task generalization of models can be largely improved by in-context learning methods such as few-shot learning and learning from task instructions, which shows the promising prospects of conducting cross-task learning research on our benchmark. We hope that the collection of the datasets and our benchmark will facilitate future work that is not limited to cross-task generalization.

翻译：尽管最近取得了一些进展,表明对大规模源代码数据进行预先培训的模型能够取得明显的概括化能力,但仍需要大量关于目标任务的数据,以进行微调。而且模型的概括性效力在很大程度上受到微调数据的规模和质量的影响,而微调数据对资源有限或缺乏资源的目标任务有害。因此,交叉任务概括化,目的是改进模型的概括化,使之适应以前未曾见过的隐蔽任务,因此具有很强的研究和应用价值。在本文件中,我们提出了一个包括216项现有与代码有关的任务的大规模基准。然后,我们用相应的元信息来说明每项任务,例如任务说明和指示,其中载有关于任务和解决办法指南的详细信息。这也帮助我们容易地建立各种各样的“培训/评价”任务分工,以评价模型的各种跨任务概括化能力。然后,我们进行一些初步试验,以证明通过诸如微调的学习和教学等可大大改进模型的交叉任务。我们从任务描述和学习前景的基准,我们从未来的基准学习的有希望的基准,我们从任务学习到没有希望的基准,我们从任务学习有希望的交叉任务。

0

相关内容

泛化理论

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

中低温固体氧化物燃料电池核-壳结构阴极的研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型KSi储氢合金的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

周期性脉冲电热效应下环氧树脂绝缘材料的老化机理与寿命评估

国家自然科学基金

0+阅读 · 2012年12月31日

有机胺为模板的金属(II)硫酸盐铁电材料的合成及性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

lncRNA-UCA1通过PKM2参与膀胱癌细胞Warburg效应的机制

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

缺氧诱导NHE1参与calpain介导ABCA1降解及胆固醇逆转运障碍

国家自然科学基金

0+阅读 · 2012年12月31日

母体慢性应激对子代注意缺陷多动障碍发生的影响及机制

国家自然科学基金

0+阅读 · 2011年12月31日

用分子对接及SPR法检测尿液中膀胱癌肿瘤标志物

国家自然科学基金

0+阅读 · 2009年12月31日

Editing Models with Task Arithmetic

Arxiv

0+阅读 · 2023年3月29日

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Arxiv

0+阅读 · 2023年3月29日

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Arxiv

2+阅读 · 2023年3月29日

Multi-lingual Evaluation of Code Generation Models

Arxiv

0+阅读 · 2023年3月28日

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Arxiv

0+阅读 · 2023年3月28日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

Arxiv

17+阅读 · 2021年6月25日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

最新，DeepSeek-R1论文登上Nature封面，附83页补充材料

人工智能与未来战争

自动驾驶中的轨迹预测大型基础模型：全面综述

万字长文《对抗雷达系统的电子战综述》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Editing Models with Task Arithmetic

Arxiv

0+阅读 · 2023年3月29日

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Arxiv

0+阅读 · 2023年3月29日

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Arxiv

2+阅读 · 2023年3月29日

Multi-lingual Evaluation of Code Generation Models

Arxiv

0+阅读 · 2023年3月28日

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Arxiv

0+阅读 · 2023年3月28日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

Arxiv

17+阅读 · 2021年6月25日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

中低温固体氧化物燃料电池核-壳结构阴极的研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型KSi储氢合金的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

周期性脉冲电热效应下环氧树脂绝缘材料的老化机理与寿命评估

国家自然科学基金

0+阅读 · 2012年12月31日

有机胺为模板的金属(II)硫酸盐铁电材料的合成及性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

lncRNA-UCA1通过PKM2参与膀胱癌细胞Warburg效应的机制

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

缺氧诱导NHE1参与calpain介导ABCA1降解及胆固醇逆转运障碍

国家自然科学基金

0+阅读 · 2012年12月31日

母体慢性应激对子代注意缺陷多动障碍发生的影响及机制

国家自然科学基金

0+阅读 · 2011年12月31日

用分子对接及SPR法检测尿液中膀胱癌肿瘤标志物

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员