Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets.
翻译:知识蒸馏旨在将一个模式(教师)获得的知识转让给另一个模式(学生),该模式通常较小。以前的方法可以表现为培训学生模仿教师所代表个人数据实例的输出激活形式。我们引入了一种新颖的方法,即所谓的关系知识蒸馏(RKD),将数据实例的相互关系转移过来。为了具体实现RKD,我们提出了远距离和角度学的蒸馏损失,以惩罚关系结构差异。在不同任务上进行的实验表明,拟议的方法改进了受过教育的学生模式,有很大的差值。特别是对于衡量学习来说,它允许学生超越教师的成绩,在标准基准数据集上达到艺术水平。