用深变换器模拟语言 (Language Modeling with Deep Transformers) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Transformer模型 · 变换 · 位置编码 ·

2019 年 7 月 11 日

Language Modeling with Deep Transformers

翻译：用深变换器模拟语言

Kazuki Irie,Albert Zeyer,Ralf Schlüter,Hermann Ney

from arxiv, To appear in the proceedings of INTERSPEECH 2019

We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured Transformer models outperform our baseline models based on the shallow stack of LSTM recurrent neural network layers. We carry out experiments on the open-source LibriSpeech 960hr task, for both 200K vocabulary word-level and 10K byte-pair encoding subword-level language modeling. We apply our word-level models to conventional hybrid speech recognition by lattice rescoring, and the subword-level models to attention based encoder-decoder models by shallow fusion. Second, we show that deep Transformer language models do not require positional encoding. The positional encoding is an essential augmentation for the self-attention mechanism which is invariant to sequence ordering. However, in autoregressive setup, as is the case for language modeling, the amount of information increases along the position dimension, which is a positional signal by its own. The analysis of attention weights shows that deep autoregressive self-attention models can automatically make use of such positional information. We find that removing the positional encoding even slightly improves the performance of these models.

翻译：我们探索语音识别语言模型中的深度自动递减变换模型。我们侧重于两个方面。首先, 我们重新审视用于语言模型的变换模型配置。我们显示, 精心配置的变换模型比基于浅层LSTM反复神经网络层的浅层的基线模型效果要强。我们实验了开放源 LibriSpeech 960hr 的LibriSpeech 960hr 任务, 包括200K 字级词汇和 10K 字节调编码子字节级语言模型。我们应用了我们的字级模型, 用于常规混合语音识别, 由 lattace 重新校准, 和亚字级模型, 用于关注基础的 encoder- decoder 模型。第二, 我们显示, 深变换语言模型不需要定位编码。定位编码是自控机制的重要增强力, 该机制不易排列顺序。然而, 在自动递增设置中, 语言模型的情况是, 信息量随定位维度增加, 这是它自己的定位信号。我们的重度分析显示, 甚深层递增性状态状态。

6

相关内容

语言模型化

语言模型化

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

95+阅读 · 2020年4月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

117+阅读 · 2020年2月3日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

专知会员服务

27+阅读 · 2019年11月10日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

157+阅读 · 2019年11月9日

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

专知会员服务

133+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

AI前线

12+阅读 · 2019年7月22日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

338页新书《Deep Learning in Natural Language Processing》

338页新书《Deep Learning in Natural Language Processing》

机器学习算法与Python学习

9+阅读 · 2018年11月6日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

Data Augmentation using Pre-trained Transformer Models

Arxiv

15+阅读 · 2020年3月4日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Arxiv

4+阅读 · 2019年1月9日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

14+阅读 · 2018年10月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

VIP会员

文章信息

相关主题

语言模型化

Transformer模型

相关VIP内容

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

95+阅读 · 2020年4月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

117+阅读 · 2020年2月3日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

专知会员服务

27+阅读 · 2019年11月10日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

157+阅读 · 2019年11月9日

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

专知会员服务

133+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《量子与军事通信安全：量子计算在军事通信中的机遇、风险、实施挑战与前景分析》最新综述

上交大推出首个AI智能体协议全面综述：从碎片化到互联互通的智能体网络

中文版 | 作战云：重构现代战场（6000字长文）

中文版 | 开发水下战场空间：解析海底威胁特征

相关资讯

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

AI前线

12+阅读 · 2019年7月22日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

338页新书《Deep Learning in Natural Language Processing》

338页新书《Deep Learning in Natural Language Processing》

机器学习算法与Python学习

9+阅读 · 2018年11月6日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

相关论文

Data Augmentation using Pre-trained Transformer Models

Arxiv

15+阅读 · 2020年3月4日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Arxiv

4+阅读 · 2019年1月9日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

14+阅读 · 2018年10月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

微信扫码咨询专知VIP会员