Recent advances in large pre-trained language models have greatly improved the performance on a broad set of NLP tasks. However, adapting an existing model to new tasks often requires (repeated) re-training over enormous labeled data that is prohibitively expensive to obtain. Moreover, models learned on new tasks may gradually "forget" about the knowledge learned from earlier tasks (i.e., catastrophic forgetting). In this paper, we study the challenge of lifelong learning to few-shot learn over a sequence of diverse NLP tasks, through continuously fine-tuning a language model. We investigate the model's ability of few-shot generalization to new tasks while retaining its performance on the previously learned tasks. We explore existing continual learning methods in solving this problem and propose a continual meta-learning approach which learns to generate adapter weights from a few examples while regularizing changes of the weights to mitigate catastrophic forgetting. We demonstrate our approach preserves model performance over training tasks and leads to positive knowledge transfer when the future tasks are learned.
翻译:在经过培训的大型语言模型方面最近取得的进展大大改善了在一系列广泛的国家学习计划任务方面的绩效。然而,对现有模式进行调整以适应新的任务,往往需要(重复)再培训,以获得极其昂贵的庞大标签数据。此外,新任务模型可能逐渐“忘记”从早期任务(即灾难性的遗忘)中学到的知识。在本文件中,我们研究终身学习的挑战,通过不断微调一种语言模型,让少数人在一系列不同的国家学习计划任务中学习。我们研究了该模型对新任务略微概括化的能力,同时保留了以前所学到的任务的绩效。我们探索了在解决这一问题方面的现有持续学习方法,并提出了一种持续的元学习方法,从几个例子中学会产生适量的适应力,同时对减轻灾难性的遗忘进行定期调整。我们展示了我们的方法,在培训任务中保留了模型的绩效,并在了解未来任务时导致积极的知识转让。