Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss functions between the same level's features to improve the effectiveness. We differently study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our new review mechanism is effective and structurally simple. Our finally designed nested and compact framework requires negligible computation overhead, and outperforms other methods on a variety of tasks. We apply our method to classification, object detection, and instance segmentation tasks. All of them witness significant student network performance improvement. Code is available at https://github.com/Jia-Research-Lab/ReviewKD
翻译:知识蒸馏从教师网络向学生网络转移知识,目的是大大改善学生网络的绩效; 以往的方法主要侧重于在同一层次的特征之间提出特征转换和损失功能,以提高效果; 我们用不同的方法研究教师和学生网络之间不同层次连接路径的因素,并揭示其重大意义; 首次提出了跨阶段连接路径; 我们的新审查机制既有效又结构简单; 我们最后设计的嵌套和紧凑框架需要忽略不计的计算间接费用, 并且比其他方法在各种任务上更优异。 我们运用我们的方法进行分类、物体探测和实例分割任务。 所有这些方法都见证了学生网络绩效的显著改进。 代码可在 https://github.com/Jia-Research-Lab/ReviewKD查阅。