魔鬼在细节中: 简单变异器的系统化化改进变异器的系统化 (The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers) - 专知论文

会员服务 ·

0

泛化理论 · Performer · SCAN · 变换 · SimPLe ·

2021 年 10 月 19 日

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

翻译：魔鬼在细节中: 简单变异器的系统化化改进变异器的系统化

Róbert Csordás,Kazuki Irie,Jürgen Schmidhuber

from arxiv, Accepted to EMNLP 2021

Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results.

翻译：最近,提出了许多数据集,以测试神经网络的系统普及能力。相伴的基线变异器, 通常在标准任务中经过默认超参数训练, 显示其显著失败。我们在这里证明, 通过重新审视嵌入规模、早期停止、相对位置嵌入和通用变异等基本模型配置, 我们可以大幅提高变异器系统化概括化的性能。我们报告五个流行数据集的改进情况: SCAN、 CFQ、 PCFG、 COGS 和数学数据集。我们的模型提高了PCFG生产率分布的精度从50%提高到85%, COGS 的精度从35%提高到81 % 。在 SCAN, 相对定位嵌入在很大程度上缓解了 EOS 决策问题( Newman等人, 2020), 以26时的截断点来产生100%的精度, 这些模型的性能差异一般在 IID 数据分割时是看不见的。这要求为系统化开发神经网络建立适当的普及化验证组。我们公开发布代码以复制我们的结果。

0

相关内容

泛化理论

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

专知会员服务

53+阅读 · 2020年6月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

已删除

将门创投

5+阅读 · 2019年9月10日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Arxiv

0+阅读 · 2021年12月16日

Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noise Images

Arxiv

0+阅读 · 2021年12月16日

Measure and Improve Robustness in NLP Models: A Survey

Arxiv

2+阅读 · 2021年12月15日

Object Pursuit: Building a Space of Objects via Discriminative Weight Generation

Arxiv

0+阅读 · 2021年12月15日

On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

Arxiv

0+阅读 · 2021年12月14日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Adversarial Examples Improve Image Recognition

Arxiv

4+阅读 · 2019年11月21日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

专知会员服务

53+阅读 · 2020年6月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML2025】用于持续多模态指令微调的动态课程化LoRA专家混合机制

生成模型中持续学习的综合综述

【斯坦福博士论文】通过以人为本的自然语言界面拓展 AI 的可及性

【新书】《LangChain生成式AI实战：使用 Python 与 LangGraph 构建大语言模型应用与高级智能体》

相关资讯

已删除

将门创投

5+阅读 · 2019年9月10日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Arxiv

0+阅读 · 2021年12月16日

Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noise Images

Arxiv

0+阅读 · 2021年12月16日

Measure and Improve Robustness in NLP Models: A Survey

Arxiv

2+阅读 · 2021年12月15日

Object Pursuit: Building a Space of Objects via Discriminative Weight Generation

Arxiv

0+阅读 · 2021年12月15日

On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

Arxiv

0+阅读 · 2021年12月14日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Adversarial Examples Improve Image Recognition

Arxiv

4+阅读 · 2019年11月21日

微信扫码咨询专知VIP会员