Traditional knowledge distillation uses a two-stage training strategy to transfer knowledge from a high-capacity teacher model to a compact student model, which relies heavily on the pre-trained teacher. Recent online knowledge distillation alleviates this limitation by collaborative learning, mutual learning and online ensembling, following a one-stage end-to-end training fashion. However, collaborative learning and mutual learning fail to construct an online high-capacity teacher, whilst online ensembling ignores the collaboration among branches and its logit summation impedes the further optimisation of the ensemble teacher. In this work, we propose a novel Peer Collaborative Learning method for online knowledge distillation, which integrates online ensembling and network collaboration into a unified framework. Specifically, given a target network, we construct a multi-branch network for training, in which each branch is called a peer. We perform random augmentation multiple times on the inputs to peers and assemble feature representations outputted from peers with an additional classifier as the peer ensemble teacher. This helps to transfer knowledge from a high-capacity teacher to peers, and in turn further optimises the ensemble teacher. Meanwhile, we employ the temporal mean model of each peer as the peer mean teacher to collaboratively transfer knowledge among peers, which helps each peer to learn richer knowledge and facilitates to optimise a more stable model with better generalisation. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks and outperforms the state-of-the-art methods.
翻译:传统知识蒸馏采用两阶段培训战略,将知识从高能力教师模式转移到紧凑学生模式,这在很大程度上依赖于受过培训的教师。最近的在线知识蒸馏通过协作学习、相互学习和在线融合,在一阶段端对端培训方式之后,通过一阶段端对端培训方式,减轻了这一限制。然而,合作学习和相互学习未能建立一个在线高能力教师网络,而在线整合忽视了各分支之间的合作及其对账总和,阻碍了合议制教师的进一步优化。在这项工作中,我们提出了一个新的同行合作学习方法,用于在线知识网络蒸馏,将在线集成和网络合作纳入一个统一的框架。具体地说,考虑到一个目标网络,我们为培训建立一个多分支网络,其中每个分支都被称为同行。我们随机增加对同行的投入,并组成由同侪模式和提议的另外一个分类师组成的特征演示。这有助于在高能力教师中将知识从高同龄教师传授给同行,在将在线的网络蒸馏网络中将更多的知识融入到更深层次的同行关系系统,同时,将每个更精细的师更精细的比更精细的比更精细的比更精细的比更细的师学习方法,从而学习更精细地学习更精细的同行的比更精细的比比更精细的比更精细的师的比更精细的比比更精细的师。