BERT/Transformer/迁移学习NLP资源大列表

2019 年 6 月 9 日 专知
BERT/Transformer/迁移学习NLP资源大列表

【导读】cedrickchee维护这个项目包含用于自然语言处理(NLP)的大型机器(深度)学习资源,重点关注转换器(BERT)的双向编码器表示、注意机制、转换器架构/网络和NLP中的传输学习。


https://github.com/cedrickchee/awesome-bert-nlp


Papers

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

  2. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai, Zhilin Yang, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le and Ruslan Salakhutdinov.

  • Uses smart caching to improve the learning of long-term dependency in Transformer. Key results: state-of-art on 5 language modeling benchmarks, including ppl of 21.8 on One Billion Word (LM1B) and 0.99 on enwiki8. The authors claim that the method is more flexible, faster during evaluation (1874 times speedup), generalizes well on small datasets, and is effective at modeling short and long sequences.

  1. Conditional BERT Contextual Augmentation by Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han and Songlin Hu.

  2. SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering by Chenguang Zhu, Michael Zeng and Xuedong Huang.

  3. Language Models are Unsupervised Multitask Learners by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.

  4. The Evolved Transformer by David R. So, Chen Liang and Quoc V. Le.

  • They used architecture search to improve Transformer architecture. Key is to use evolution and seed initial population with Transformer itself. The architecture is better and more efficient, especially for small size models.

Articles

BERT and Transformer

  1. Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing from Google AI.

  2. The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning).

  3. Dissecting BERT by Miguel Romero and Francisco Ingham - Understand BERT in depth with an intuitive, straightforward explanation of the relevant concepts.

  4. A Light Introduction to Transformer-XL.

  5. Generalized Language Models by Lilian Weng, Research Scientist at OpenAI.

Attention Concept

  1. The Annotated Transformer by Harvard NLP Group - Further reading to understand the "Attention is all you need" paper.

  2. Attention? Attention! - Attention guide by Lilian Weng from OpenAI.

  3. Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) by Jay Alammar, an Instructor from Udacity ML Engineer Nanodegree.

Transformer Architecture

  1. The Transformer blog post.

  2. The Illustrated Transformer by Jay Alammar, an Instructor from Udacity ML Engineer Nanodegree.

  3. Watch Łukasz Kaiser’s talk walking through the model and its details.

  4. Transformer-XL: Unleashing the Potential of Attention Models by Google Brain.

  5. Generative Modeling with Sparse Transformers by OpenAI - an algorithmic improvement of the attention mechanism to extract patterns from sequences 30x longer than possible previously.

OpenAI Generative Pre-Training Transformer (GPT) and GPT-2

  1. Better Language Models and Their Implications.

  2. Improving Language Understanding with Unsupervised Learning - this is an overview of the original GPT model.

  3. 🦄  How to build a State-of-the-Art Conversational AI with Transfer Learning by Hugging Face.

Additional Reading

  1. How to Build OpenAI's GPT-2: "The AI That's Too Dangerous to Release".

  2. OpenAI’s GPT2 - Food to Media hype or Wake Up Call?

Official Implementations

  1. google-research/bert - TensorFlow code and pre-trained models for BERT.

Other Implementations

PyTorch

  1. huggingface/pytorch-pretrained-BERT - A PyTorch implementation of Google AI's BERT model with script to load Google's pre-trained models by Hugging Face.

  2. codertimo/BERT-pytorch - Google AI 2018 BERT pytorch implementation.

  3. innodatalabs/tbert - PyTorch port of BERT ML model.

  4. kimiyoung/transformer-xl - Code repository associated with the Transformer-XL paper.

  5. dreamgonfly/BERT-pytorch - PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".

  6. dhlee347/pytorchic-bert - Pytorch implementation of Google BERT

Keras

  1. Separius/BERT-keras - Keras implementation of BERT with pre-trained weights.

  2. CyberZHG/keras-bert - Implementation of BERT that could load official pre-trained models for feature extraction and prediction.

TensorFlow

  1. guotong1988/BERT-tensorflow - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

  2. kimiyoung/transformer-xl - Code repository associated with the Transformer-XL paper.

Chainer

  1. soskek/bert-chainer - Chainer implementation of "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".


-END-

专 · 知

专知,专业可信的人工智能知识分发,让认知协作更快更好!欢迎登录www.zhuanzhi.ai,注册登录专知,获取更多AI知识资料!

欢迎微信扫一扫加入专知人工智能知识星球群,获取最新AI专业干货知识教程视频资料和与专家交流咨询

请加专知小助手微信(扫一扫如下二维码添加),加入专知人工智能主题群,咨询技术商务合作~

专知《深度学习:算法到实战》课程全部完成!550+位同学在学习,现在报名,限时优惠!网易云课堂人工智能畅销榜首位!

点击“阅读原文”,了解报名专知《深度学习:算法到实战》课程

登录查看更多
17

相关内容

BERT全称Bidirectional Encoder Representations from Transformers,是预训练语言表示的方法,可以在大型文本语料库(如维基百科)上训练通用的“语言理解”模型,然后将该模型用于下游NLP任务,比如机器翻译、问答。

1、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT论文)

谷歌BERT斩获最佳长论文!自然语言顶会NAACL2019最佳论文5篇出炉

Google NAACL2019 最佳论文

作者:Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

摘要:本文介绍一种称为BERT的新语言表征模型,意为来自变换器的双向编码器表征量(BidirectionalEncoder Representations from Transformers)。不同于最近的语言表征模型(Peters等,2018; Radford等,2018),BERT旨在基于所有层的左、右语境来预训练深度双向表征。因此,预训练的BERT表征可以仅用一个额外的输出层进行微调,进而为很多任务(如问答和语言推理)创建当前最优模型,无需对任务特定架构做出大量修改。BERT的概念很简单,但实验效果很强大。它刷新了11个NLP任务的当前最优结果,包括将GLUE基准提升至80.4%(7.6%的绝对改进)、将MultiNLI的准确率提高到86.7%(5.6%的绝对改进),以及将SQuADv1.1问答测试F1的得分提高至93.2分(1.5分绝对提高)——比人类性能还高出2.0分。

网址:

https://www.zhuanzhi.ai/paper/7acdc843627c496a2ad7fb2785357dec

BERT的slides: BERT一作Jacob Devlin斯坦福演讲PPT:BERT介绍与答疑

2、Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Google CMU

作者:Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

摘要:Transformer 网络具有学习更长期依赖性的潜力,但这种潜力往往会受到语言建模中上下文长度固定的限制。因此,我们提出了一种叫做 Transformer-XL 的新神经架构来解决这一问题,它可以在不破坏时间一致性的情况下,让 Transformer 超越固定长度学习依赖性。具体来说,它是由片段级的循环机制和全新的位置编码策略组成的。我们的方法不仅可以捕获更长的依赖关系,还可以解决上下文碎片化的问题。Transformer-XL 学习到的依赖性比 RNN 学习到的长 80%,比标准 Transformer 学到的长 450%,无论在长序列还是短序列中都得到了更好的结果,而且在评估时比标准 Transformer 快 1800+ 倍。此外,我们还提升了 bpc 和困惑度的当前最佳结果,在 enwiki8 上 bpc 从 1.06 提升至 0.99,在 text8 上从 1.13 提升至 1.08,在 WikiText-103 上困惑度从 20.5 提升到 18.3,在 One Billion Word 上从 23.7 提升到 21.8,在宾州树库(不经过微调的情况下)上从 55.3 提升到 54.5。我们的代码、预训练模型以及超参数在 TensorFlow 和 PyTorch 中都可以使用。。

网址:

https://www.zhuanzhi.ai/paper/5c1ec941e06a20e4966a3db298b45211

3、XLNet: Generalized Autoregressive Pretraining for Language Understanding

Google CMU

作者:Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

摘要:由于上下文双向建模的表达能力更强,降噪自编码类型中的典型代表BERT能够比自回归语言模型取得更好的结果。即,上下文建模获得双向的信息在Language Understanding中是很重要的。但是BERT存在以下不足:(1)在输入端依赖mask的掩模的方式,遮蔽部分的输入信息。(2)忽略了被mask位置之间的依赖性。这两点在预训练-微调两个阶段存在不符。即,上述2个方面在预训练和微调这2个阶段之间都是有差异的。在正视了上述优缺点之后,本文提出一种通用(或者广义,英语原文是generalized)的自回归预训练方法:XLNet。XLNet的贡献在于(1)新的双向上下文学习方法:分解输入的顺序,对其进行排列组合,并遍历所有的排列组合,获得最大似然期望。(2)克服BERT自回归中的缺陷。XLNet在预训练中融合Transformer-XL和state-of-the-art自回归模型的优点。实验结果:XLNet在20个任务中超出了BERT,且很多是碾压式地超越。XLNet在其中18个任务中取得了目前最优结果,包括问答、自然语言推理、情感分析和文档排序。

网址:

https://www.zhuanzhi.ai/paper/74979afe231290d0c1ad43d4fab17b09

4、ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

作者:Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

摘要:通常而言,在预训练自然语言表征时增加模型大小可以提升模型在下游任务中的性能。但在某些情况下,由于 GPU/TPU 内存限制、训练时间延长以及意外的模型退化等原因,进一步增加模型大小的难度也随之增加。所以,为了解决这些问题,来自谷歌的研究者提出通过两种参数削减(parameter-reduction)技术来降低内存消耗,加快 BERT 的训练速度。综合实验表明,ALBERT 的扩展效果要优于原始 BERT。此外,他们还使用了聚焦于句间连贯性建模的自监督损失,并证明这种损失对下游任务中的多语句输入有持续帮助。ALBERT 模型在 GLUE、RACE 和 SQuAD 基准测试上都取得了新的 SOTA 效果,并且参数量少于 BERT-large。

网址:

https://www.zhuanzhi.ai/paper/a0067ac863579c6268b0751e12decd04

​更多预训练语言模型的论文请上:

https://github.com/thunlp/PLMpapers

成为VIP会员查看完整内容
0
41

Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, \textit{Transformer-XL}, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence. Concretely, it consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the problem of context fragmentation. As a result, Transformer-XL learns dependency that is about 80\% longer than RNNs and 450\% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformer during evaluation. Additionally, we improve the state-of-the-art (SoTA) results of bpc/perplexity from 1.06 to 0.99 on enwiki8, from 1.13 to 1.08 on text8, from 20.5 to 18.3 on WikiText-103, from 23.7 to 21.8 on One Billion Word, and from 55.3 to 54.5 on Penn Treebank (without finetuning). Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.

0
3
下载
预览
小贴士
相关VIP内容
BERT进展2019四篇必读论文
专知会员服务
41+阅读 · 2020年1月2日
【机器学习课程】Google机器学习速成课程
专知会员服务
54+阅读 · 2019年12月2日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
21+阅读 · 2019年10月17日
【深度学习视频分析/多模态学习资源大列表】
专知会员服务
53+阅读 · 2019年10月16日
TensorFlow 2.0 学习资源汇总
专知会员服务
28+阅读 · 2019年10月9日
机器学习相关资源(框架、库、软件)大列表
专知会员服务
14+阅读 · 2019年10月9日
最新BERT相关论文清单,BERT-related Papers
专知会员服务
28+阅读 · 2019年9月29日
相关论文
Rodrigo Nogueira,Wei Yang,Kyunghyun Cho,Jimmy Lin
5+阅读 · 2019年10月31日
Question Generation by Transformers
Kettip Kriangchaivech,Artit Wangperawong
3+阅读 · 2019年9月14日
Betty van Aken,Benjamin Winter,Alexander Löser,Felix A. Gers
3+阅读 · 2019年9月11日
Yang Liu,Mirella Lapata
4+阅读 · 2019年8月22日
Kazuki Irie,Albert Zeyer,Ralf Schlüter,Hermann Ney
5+阅读 · 2019年7月11日
Zhilin Yang,Zihang Dai,Yiming Yang,Jaime Carbonell,Ruslan Salakhutdinov,Quoc V. Le
13+阅读 · 2019年6月19日
Universal Transformers
Mostafa Dehghani,Stephan Gouws,Oriol Vinyals,Jakob Uszkoreit,Łukasz Kaiser
3+阅读 · 2019年3月5日
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai,Zhilin Yang,Yiming Yang,William W. Cohen,Jaime Carbonell,Quoc V. Le,Ruslan Salakhutdinov
3+阅读 · 2019年1月9日
Music Transformer
Cheng-Zhi Anna Huang,Ashish Vaswani,Jakob Uszkoreit,Noam Shazeer,Ian Simon,Curtis Hawthorne,Andrew M. Dai,Matthew D. Hoffman,Monica Dinculescu,Douglas Eck
3+阅读 · 2018年12月12日
Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova
9+阅读 · 2018年10月11日
Top