Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.
翻译:最近,为提升具体领域(如生物医学和计算机科学)的任务绩效,提议了针对特定领域的PLM(如生物医学和计算机科学)任务绩效,继续以特定领域的公司对一般PLM进行预培训,对一般的PLM进行预培训,然而,这一Dome-Adaptive培训(DAPT;Gururrangan等人(202020年))往往忘记了以前由一般PLM获得的一般知识,这导致了灾难性的遗忘现象和次优的绩效。为了缓解这一问题,我们提出了一个新的通用记忆增强预科语言模式(G-MAP)框架,通过从冻结的一般PLM中创建的记忆代表来增强特定领域的PLM(G-MAP),但不丧失任何一般知识。具体地说,我们提出了一个新的记忆强化层,并在此基础上,探索了不同的扩大战略,以构建记忆代表,然后将它适应性地结合到特定领域的PLM。我们展示了G-MAPA在不同领域(生物医学和计算机科学出版物、新闻和评论)和不同种类(文字分类、QA、NER)和各种任务的成果(文本分类、QAMAPA-TA取得所有任务的成果)的有效性。