Continual learning (CL) over non-stationary data streams remains one of the long-standing challenges in deep neural networks (DNNs) as they are prone to catastrophic forgetting. CL models can benefit from self-supervised pre-training as it enables learning more generalizable task-agnostic features. However, the effect of self-supervised pre-training diminishes as the length of task sequences increases. Furthermore, the domain shift between pre-training data distribution and the task distribution reduces the generalizability of the learned representations. To address these limitations, we propose Task Agnostic Representation Consolidation (TARC), a two-stage training paradigm for CL that intertwines task-agnostic and task-specific learning whereby self-supervised training is followed by supervised learning for each task. To further restrict the deviation from the learned representations in the self-supervised stage, we employ a task-agnostic auxiliary loss during the supervised stage. We show that our training paradigm can be easily added to memory- or regularization-based approaches and provides consistent performance gain across more challenging CL settings. We further show that it leads to more robust and well-calibrated models.
翻译:对非静止数据流的持续学习仍然是深神经网络(DNN)的长期挑战之一,因为它们容易被灾难性地遗忘。CL模型可以受益于自我监督的训练前阶段学习,因为这样可以学习更普遍适用的任务 -- -- 不可知性特征。然而,自监督的训练前学习的影响随着任务序列长度的延长而减少。此外,培训前数据分配和任务分配之间的领域转移降低了学习表现的普遍性。为了解决这些局限性,我们提议了Agnistic Presidental Agresentive Agresentive Guilding(TARC),这是CL的两阶段培训模式,它使任务 -- -- 不可知性和任务特定学习相互配合,从而在自我监督的培训之后对每项任务进行监督的学习。为了进一步限制在自监督阶段中学习的偏差,我们在监督阶段采用了任务-不可知性辅助损失。我们表明,我们的训练模式很容易被添加到基于记忆或正规化的方法中,并在更具挑战性的CL环境中提供一致的业绩收益。我们进一步表明,它将导致更加稳健健的模型。