小型化的两阶段精简中文预训练模型MiniRBT (MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model) - 专知论文

会员服务 ·

0

预训练 · 预训练模型 · 机器阅读理解 · 小型化 · 语言处理 ·

2023 年 4 月 3 日

MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model

翻译：小型化的两阶段精简中文预训练模型MiniRBT

Xin Yao,Ziqing Yang,Yiming Cui,Shijin Wang

from arxiv, 4 pages

In natural language processing, pre-trained language models have become essential infrastructures. However, these models often suffer from issues such as large size, long inference time, and challenging deployment. Moreover, most mainstream pre-trained models focus on English, and there are insufficient studies on small Chinese pre-trained models. In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. MiniRBT employs a narrow and deep student model and incorporates whole word masking and two-stage distillation during pre-training to make it well-suited for most downstream tasks. Our experiments on machine reading comprehension and text classification tasks reveal that MiniRBT achieves 94% performance relative to RoBERTa, while providing a 6.8x speedup, demonstrating its effectiveness and efficiency.

翻译：在自然语言处理领域，预训练语言模型已经成为不可或缺的基础设施。然而，这些模型通常存在诸如体积大、推理时间长、部署困难等问题。此外，大多数主流的预训练模型都关注英文，而对小型中文预训练模型的研究不足。本文介绍了MiniRBT，一个小型中文预训练模型，旨在推动中文自然语言处理研究。MiniRBT采用狭长深度的学生模型，并在预训练期间采用整词Masking和两阶段精简，使其非常适合大多数下游任务。我们在机器阅读理解和文本分类任务上的实验表明，MiniRBT相对于RoBERTa的性能达到了94%，同时提供了6.8倍的加速，说明其有效性和高效性。

0

相关内容

预训练

在搭建网络模型时，需要随机初始化参数，然后开始训练网络，不断调整直到网络的损失越来越小。在训练的过程中，一开始初始化的参数会不断变化。当参数训练到比较好的时候就可以将训练模型的参数保存下来，以便训练好的模型可以在下次执行类似任务时获得较好的结果。

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

IJCAI2022开会了! 微软等《领域泛化Domain Generalization》教程，阐述DG最新进展，附PPT和视频

IJCAI2022开会了! 微软等《领域泛化Domain Generalization》教程，阐述DG最新进展，附PPT和视频

专知会员服务

61+阅读 · 2022年7月24日

中文预训练模型研究进展

中文预训练模型研究进展

专知会员服务

79+阅读 · 2022年7月21日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

96+阅读 · 2021年10月19日

预训练模型如何用于文本挖掘？看这份KDD2021-UIUC《预训练文本表示:模型与应用在文本挖掘》教程，附200页Slides

专知会员服务

44+阅读 · 2021年8月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

GSK-3调控GAPDH嵌入线粒体的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于DBN协同建模的中文及跨语种语音结构事件检测研究

国家自然科学基金

0+阅读 · 2011年12月31日

表观遗传调控在发育早期铅暴露致LOAD进程中的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于在线百科和问答社区的中文文本蕴涵知识获取

国家自然科学基金

0+阅读 · 2011年12月31日

TGF-β28608;活Myocardin家族诱导骨髓间充质干细胞分化的研究

国家自然科学基金

0+阅读 · 2008年12月31日

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

Arxiv

0+阅读 · 2023年5月22日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

VIP会员

文章信息

相关主题

预训练模型

机器阅读理解

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

IJCAI2022开会了! 微软等《领域泛化Domain Generalization》教程，阐述DG最新进展，附PPT和视频

IJCAI2022开会了! 微软等《领域泛化Domain Generalization》教程，阐述DG最新进展，附PPT和视频

专知会员服务

61+阅读 · 2022年7月24日

中文预训练模型研究进展

中文预训练模型研究进展

专知会员服务

79+阅读 · 2022年7月21日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

96+阅读 · 2021年10月19日

预训练模型如何用于文本挖掘？看这份KDD2021-UIUC《预训练文本表示:模型与应用在文本挖掘》教程，附200页Slides

专知会员服务

44+阅读 · 2021年8月18日

热门VIP内容

开通专知VIP会员享更多权益服务

因果强化学习的统一框架：综述、分类体系、算法与应用

《无人机系统 - 反无人机系统：测试方法》364页

【MIT博士论文】语言模型的推理时学习算法

美军低成本无人作战攻击系统（LUCAS）：扩大无人机战争规模

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

相关论文

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

Arxiv

0+阅读 · 2023年5月22日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

相关基金

GSK-3调控GAPDH嵌入线粒体的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于DBN协同建模的中文及跨语种语音结构事件检测研究

国家自然科学基金

0+阅读 · 2011年12月31日

表观遗传调控在发育早期铅暴露致LOAD进程中的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于在线百科和问答社区的中文文本蕴涵知识获取

国家自然科学基金

0+阅读 · 2011年12月31日

TGF-β28608;活Myocardin家族诱导骨髓间充质干细胞分化的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员