We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this additional term can be interpreted as a projection of the conditional probability given by a classifier's output to the uniform distribution. By applying the Pythagorean theorem for KL divergence, we then prove that this projection may (in theory) improve the performance of continual learning methods. In our extensive experimental results, we apply CPR to several state-of-the-art regularization-based continual learning methods and benchmark performance on popular image recognition datasets. Our results demonstrate that CPR indeed promotes a wide local minima and significantly improves both accuracy and plasticity while simultaneously mitigating the catastrophic forgetting of baseline continual learning methods. The codes and scripts for this work are available at https://github.com/csm9493/CPR_CL.
翻译:我们提出一个可以适用于基于正规化的现有持续学习方法的简单普通补丁,称为分类预测正规化(CPR)。受具有广泛本地迷你和信息理论的神经网络最新结果的启发,CPR又增加了一个正规化术语,使分类者产出概率的倍增率最大化。我们证明,这一新增术语可以被解释为对分类者产出给统一分布提供的有条件概率的预测。通过对KL差异应用Pythagorean理论,我们然后证明这一预测(理论上)可能改善持续学习方法的绩效。在我们广泛的实验结果中,我们将CPR应用到几个最先进的基于正规化的持续学习方法和大众图像识别数据集的基准性业绩。我们的结果表明,CPR确实促进一个广泛的本地微型,并显著提高准确性和可塑性,同时减轻基准持续学习方法的灾难性遗忘。这项工作的代码和脚本见https://github.com/cm9493/CPR_CL。