使变革者解决组成任务 (Making Transformers Solve Compositional Tasks) - 专知论文

会员服务 ·

0

泛化理论 · 变换 · Transformer模型 · MoDELS · 可辨认的 ·

2021 年 8 月 9 日

Making Transformers Solve Compositional Tasks

翻译：使变革者解决组成任务

Santiago Ontañón,Joshua Ainslie,Vaclav Cvicek,Zachary Fisher

Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

翻译：一些研究报告说,变形模型无法对组成进行概括化,这是许多非产物分类等非产物分类任务中一种关键的一般化类型。在本文中,我们探索了变形模型的设计空间,表明若干设计决定给模型的暗示偏差显著地影响到组成上的概括化。通过这一探索,我们确定了变形模型的配置,这些变形模型在多种组成任务中比文献中报告的范围要大得多,并在一个语义分类一般化基准(COGS)和一个字符串编辑操作构成基准(PCFG)中实现了最新的结果。

0

相关内容

泛化理论

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

专知会员服务

51+阅读 · 2020年11月19日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

107+阅读 · 2020年8月30日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Iterative Decoding for Compositional Generalization in Transformers

Iterative Decoding for Compositional Generalization in Transformers

Arxiv

0+阅读 · 2021年10月8日

Towards Continual Knowledge Learning of Language Models

Arxiv

0+阅读 · 2021年10月7日

Long-tailed Distribution Adaptation

Arxiv

1+阅读 · 2021年10月6日

Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer

Arxiv

0+阅读 · 2021年10月6日

HittER: Hierarchical Transformers for Knowledge Graph Embeddings

Arxiv

1+阅读 · 2021年10月6日

Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers

Arxiv

0+阅读 · 2021年10月5日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

VIP会员

文章信息

相关主题

Transformer模型

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

专知会员服务

51+阅读 · 2020年11月19日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

107+阅读 · 2020年8月30日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Iterative Decoding for Compositional Generalization in Transformers

Iterative Decoding for Compositional Generalization in Transformers

Arxiv

0+阅读 · 2021年10月8日

Towards Continual Knowledge Learning of Language Models

Arxiv

0+阅读 · 2021年10月7日

Long-tailed Distribution Adaptation

Arxiv

1+阅读 · 2021年10月6日

Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer

Arxiv

0+阅读 · 2021年10月6日

HittER: Hierarchical Transformers for Knowledge Graph Embeddings

Arxiv

1+阅读 · 2021年10月6日

Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers

Arxiv

0+阅读 · 2021年10月5日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

微信扫码咨询专知VIP会员