【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏 - 专知

会员服务 ·

0

【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏

2020 年 12 月 31 日 专知

高德智能技术中心研发团队在工作中设计了对比学习框架进行知识蒸馏，并在此基础上提出COS-NCE LOSS，该论文已被AI顶会AAAI2021接收。

NLP自然语言处理在高德各个业务线发挥重要作用，例如动态事件命名实时识别，搜索场景用户语义理解，共享出行通话文本自动判责等。

而NLP领域近期最重要的进展当属预训练模型，Google发布的BERT预训练语言模型一经推出就霸占了NLP各大榜单，提升了诸多 NLP 任务的性能，在11种不同NLP测试中创出最佳成绩，预训练模型成为自然语言理解主要趋势之一。

预训练模型通常包括两个阶段：

第一阶段是在大型语料库根据给定上下文预测特定文本。

第二阶段是在特定的下游任务进行finetuning。

BERT的强大毫无疑问，但由于模型有上亿参数量体型庞大，单个样本计算一次的开销动辄上百毫秒，因而给部署线上服务带来很大的困扰，如何让BERT瘦身是工业界以及学术界重点攻坚问题。

Hinton的文章"Distilling the Knowledge in a Neural Network"首次提出了知识蒸馏的概念，将teacher知识压缩到student网络，student网络与teacher网络具有相同的预测能力但拥有更快的推理速度，极大节省了计算资源。

目前前沿的技术有微软的 BERT-PKD (Patient Knowledge Distillation for BERT)，huggingface 的 DistilBERT，以及华为TinyBERT。其基本思路都是减少 transformer encoding 的层数和 hidden size 大小，实现细节上各有不同，主要差异体现在 loss 的设计上。

然而知识蒸馏最核心问题是如何捕捉到模型潜在语义信息，而之前工作焦点在loss设计上，而这种方式让模型关注在单个样本的表达信息细节上，对于捕捉潜在语义信息无能为力。

高德智能技术中心研发团队在工作中设计了对比学习框架进行知识蒸馏，并在此基础上提出COS-NCE LOSS，通过优化COS-NCE LOSS拉近正样本，并拉远负样本距离，能够让模型有效的学习到潜在语义表达信息（LRC-BERT对比DistillBERT，BERT-PKD并不限制模型的结构，student网络可以灵活的选择模型结构以及特征维度）。

同时为进一步让LRC-BERT更加有效的学习，我们设计了两阶段训练过程。最后LRC-BERT在word vector embedding layer引入梯度扰动技术提升模型鲁棒性。

本文的主要贡献点概括如下：

提出了对比学习框架进行知识蒸馏，在此基础上提出COS-NCE LOSS可以有效的捕捉潜在语义信息。

梯度扰动技术首次引入到知识蒸馏中，在实验中验证其能够提升模型的鲁棒性。
提出使用两阶段模型训练方法更加高效的提取中间层潜在语义信息。
本文在General Language Understanding Evaluation (GLUE)评测集合取得了蒸馏模型的SOTA效果。

https://www.zhuanzhi.ai/paper/9f999c711b0341b5df16076ce71f02ac

专知便捷查看

便捷下载，请关注专知公众号（点击上方蓝色专知关注）

后台回复“LRC” 可以获取《【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏》专知下载链接索引

专知，专业可信的人工智能知识分发，让认知协作更快更好！欢迎注册登录专知www.zhuanzhi.ai，获取5000+AI主题干货知识资料！

欢迎微信扫一扫加入专知人工智能知识星球群，获取最新AI专业干货知识教程资料和与专家交流咨询！

点击“ 阅读原文 ”，了解使用专知 ，查看获取5000+AI主题知识资源

登录查看更多

0

相关内容

知识蒸馏

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

文澜：超大规模多模态预训练模型！

专知会员服务

66+阅读 · 2021年3月21日

【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏的自然语言理解

专知会员服务

27+阅读 · 2020年12月31日

【AAAI2021】“可瘦身”的生成式对抗网络

【AAAI2021】“可瘦身”的生成式对抗网络

专知会员服务

13+阅读 · 2020年12月12日

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

专知会员服务

18+阅读 · 2020年11月11日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【ICML2020】持续终身学习的神经主题建模

【ICML2020】持续终身学习的神经主题建模

专知会员服务

39+阅读 · 2020年6月22日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

语言模型及Word2vec与Bert简析

语言模型及Word2vec与Bert简析

AINLP

6+阅读 · 2020年5月7日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

基于知识蒸馏的BERT模型压缩

基于知识蒸馏的BERT模型压缩

大数据文摘

18+阅读 · 2019年10月14日

BERT, RoBERTa, DistilBERT, XLNet的用法对比

BERT, RoBERTa, DistilBERT, XLNet的用法对比

AI科技评论

4+阅读 · 2019年9月15日

ACL 2019 | 基于知识增强的语言表示模型，多项NLP任务表现超越BERT

ACL 2019 | 基于知识增强的语言表示模型，多项NLP任务表现超越BERT

PaperWeekly

8+阅读 · 2019年6月3日

超越BERT、GPT，微软提出通用预训练模型MASS

超越BERT、GPT，微软提出通用预训练模型MASS

机器之心

4+阅读 · 2019年5月10日

进一步改进GPT和BERT：使用Transformer的语言模型

进一步改进GPT和BERT：使用Transformer的语言模型

机器之心

16+阅读 · 2019年5月1日

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

机器之心

15+阅读 · 2019年3月18日

BERT-预训练的强大

BERT-预训练的强大

微信AI

60+阅读 · 2019年3月7日

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Arxiv

8+阅读 · 2020年10月9日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test

Arxiv

4+阅读 · 2019年9月8日

DocBERT: BERT for Document Classification

Arxiv

6+阅读 · 2019年8月22日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Arxiv

5+阅读 · 2018年9月13日

VIP会员

相关主题

相关VIP内容

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

文澜：超大规模多模态预训练模型！

专知会员服务

66+阅读 · 2021年3月21日

【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏的自然语言理解

专知会员服务

27+阅读 · 2020年12月31日

【AAAI2021】“可瘦身”的生成式对抗网络

【AAAI2021】“可瘦身”的生成式对抗网络

专知会员服务

13+阅读 · 2020年12月12日

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

【NeurIPS 2020】广义神经网络中的知识蒸馏: 风险约束、数据效率和不完善的教师

专知会员服务

18+阅读 · 2020年11月11日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【ICML2020】持续终身学习的神经主题建模

【ICML2020】持续终身学习的神经主题建模

专知会员服务

39+阅读 · 2020年6月22日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

语言模型及Word2vec与Bert简析

语言模型及Word2vec与Bert简析

AINLP

6+阅读 · 2020年5月7日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

基于知识蒸馏的BERT模型压缩

基于知识蒸馏的BERT模型压缩

大数据文摘

18+阅读 · 2019年10月14日

BERT, RoBERTa, DistilBERT, XLNet的用法对比

BERT, RoBERTa, DistilBERT, XLNet的用法对比

AI科技评论

4+阅读 · 2019年9月15日

ACL 2019 | 基于知识增强的语言表示模型，多项NLP任务表现超越BERT

ACL 2019 | 基于知识增强的语言表示模型，多项NLP任务表现超越BERT

PaperWeekly

8+阅读 · 2019年6月3日

超越BERT、GPT，微软提出通用预训练模型MASS

超越BERT、GPT，微软提出通用预训练模型MASS

机器之心

4+阅读 · 2019年5月10日

进一步改进GPT和BERT：使用Transformer的语言模型

进一步改进GPT和BERT：使用Transformer的语言模型

机器之心

16+阅读 · 2019年5月1日

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

机器之心

15+阅读 · 2019年3月18日

BERT-预训练的强大

BERT-预训练的强大

微信AI

60+阅读 · 2019年3月7日

相关论文

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Arxiv

8+阅读 · 2020年10月9日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test

Arxiv

4+阅读 · 2019年9月8日

DocBERT: BERT for Document Classification

Arxiv

6+阅读 · 2019年8月22日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Arxiv

5+阅读 · 2018年9月13日

大家都在搜

大型语言模型

久别重逢话双塔

软件无线电

无人机测控通信自组网技术综述

微信扫码咨询专知VIP会员