网络问答系统两阶段多教师知识蒸馏模型压缩 (Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System)

Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous model compression methods usually suffer from information loss during the model compression procedure, leading to inferior models compared with the original one. To tackle this challenge, we propose a Two-stage Multi-teacher Knowledge Distillation (TMKD for short) method for web Question Answering system. We first develop a general Q\&A distillation task for student model pre-training, and further fine-tune this pre-trained student model with multi-teacher knowledge distillation on downstream tasks (like Web Q\&A task, MNLI, SNLI, RTE tasks from GLUE), which effectively reduces the overfitting bias in individual teacher models, and transfers more general knowledge to the student model. The experiment results show that our method can significantly outperform the baseline methods and even achieve comparable results with the original teacher models, along with substantial speedup of model inference.

翻译：深层次的训练前和微调模型(如BERT和OpenAI GPT)在回答问题方面表现出了极佳的结果,然而,由于模型参数数量之多,这些模型的推论速度非常缓慢。如何将这些复杂的模型应用于实际业务情景是一个具有挑战性但实际的问题。以前的模型压缩方法在模型压缩过程中通常会失去信息,导致比原模型低级模型。为了应对这一挑战,我们提议了一种双阶段多教师知识蒸馏方法,用于网络问题解答系统。我们首先为学生模型预培训开发了一个普通的 ⁇ 蒸馏任务,并进一步微调了这一预先培训的学生模型,在下游任务上(如Web ⁇ A任务、MNLI、SNLI、RTE任务),这有效地减少了个别教师模型中的过度偏差,并将更一般的知识传授给学生模型。实验结果表明,我们的方法可以大大超越基线方法,甚至比原始教师模型取得可比的结果,同时大幅度的速度。

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

189+阅读 · 2020年5月31日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

16+阅读 · 2020年5月19日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

94+阅读 · 2020年3月25日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

14+阅读 · 2020年3月7日