MTG:多语文文本制作基准套件 (MTG: A Benchmark Suite for Multilingual Text Generation) - 专知论文

会员服务 ·

0

MoDELS · 知识 (knowledge) · Performer · 模型性能 · HTTPS ·

2022 年 6 月 10 日

MTG: A Benchmark Suite for Multilingual Text Generation

翻译：MTG:多语文文本制作基准套件

Yiran Chen,Zhenqiao Song,Xianze Wu,Danqing Wang,Jingjing Xu,Jiaze Chen,Hao Zhou,Lei Li

from arxiv, NAACL2022 findings

We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese). The multiway setup enables testing knowledge transfer capabilities for a model across languages and tasks. Using MTG, we train and analyze several popular multilingual generation models from different aspects. Our benchmark suite fosters model performance enhancement with more human-annotated parallel data. It provides comprehensive evaluations with diverse generation scenarios. Code and data are available at \url{https://github.com/zide05/MTG}.

翻译：我们引入了MTG,这是培训和评价多语种文本生成的新基准套件,这是第一个提出多语言多语言文本生成数据集,具有最大的人文附加说明数据(400k),其中包括五种语言(英文、德文、法文、西班牙文和中文)的四种生成任务(代号、问题生成、产权生成和文本汇总);多道路设置能够测试跨语言和任务模式的知识转让能力;利用MTG,我们从不同方面培训和分析几种流行的多语言生成模型;我们的基准套件用更多人文附加说明的平行数据促进模型性能增强;它提供不同代数情景的综合评估;代码和数据可在以下网站查阅:https://github.com/zide05/MTG}。

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

41+阅读 · 2022年6月30日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

26+阅读 · 2022年3月3日

对比学习简述

专知会员服务

88+阅读 · 2021年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

25+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

THU数据派

11+阅读 · 2019年3月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

11+阅读 · 2018年6月24日

黄瓜ERF转录因子CsERF1调控耐涝性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

室温多铁性材料的合成、物性及结构研究

国家自然科学基金

0+阅读 · 2011年12月31日

细胞衰老和SENEX基因对老年外周CD4+CD25+ Treg增强的影响

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

老年抑郁症转化为阿尔茨海默病的机制探讨:多模态核磁共振成像3年随访研究

国家自然科学基金

0+阅读 · 2009年12月31日

非晶稀土氧化物高k栅介质材料的制备及物理特性研究

国家自然科学基金

0+阅读 · 2008年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

固体材料及薄膜的若干非线性物理现象的数值计算研究

国家自然科学基金

0+阅读 · 2008年12月31日

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Arxiv

0+阅读 · 2022年7月26日

AMLB: an AutoML Benchmark

Arxiv

0+阅读 · 2022年7月25日

Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset

Arxiv

0+阅读 · 2022年7月25日

Improving Adversarial Robustness via Mutual Information Estimation

Arxiv

0+阅读 · 2022年7月25日

ArmanEmo: A Persian Dataset for Text-based Emotion Detection

Arxiv

0+阅读 · 2022年7月24日

Lyra: A Benchmark for Turducken-Style Code Generation

Arxiv

0+阅读 · 2022年7月24日

SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles

Arxiv

0+阅读 · 2022年7月23日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

41+阅读 · 2022年6月30日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

26+阅读 · 2022年3月3日

对比学习简述

专知会员服务

88+阅读 · 2021年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

25+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

THU数据派

11+阅读 · 2019年3月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

11+阅读 · 2018年6月24日

相关论文

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Arxiv

0+阅读 · 2022年7月26日

AMLB: an AutoML Benchmark

Arxiv

0+阅读 · 2022年7月25日

Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset

Arxiv

0+阅读 · 2022年7月25日

Improving Adversarial Robustness via Mutual Information Estimation

Arxiv

0+阅读 · 2022年7月25日

ArmanEmo: A Persian Dataset for Text-based Emotion Detection

Arxiv

0+阅读 · 2022年7月24日

Lyra: A Benchmark for Turducken-Style Code Generation

Arxiv

0+阅读 · 2022年7月24日

SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles

Arxiv

0+阅读 · 2022年7月23日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

黄瓜ERF转录因子CsERF1调控耐涝性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

室温多铁性材料的合成、物性及结构研究

国家自然科学基金

0+阅读 · 2011年12月31日

细胞衰老和SENEX基因对老年外周CD4+CD25+ Treg增强的影响

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

老年抑郁症转化为阿尔茨海默病的机制探讨:多模态核磁共振成像3年随访研究

国家自然科学基金

0+阅读 · 2009年12月31日

非晶稀土氧化物高k栅介质材料的制备及物理特性研究

国家自然科学基金

0+阅读 · 2008年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

固体材料及薄膜的若干非线性物理现象的数值计算研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员