In this paper, we consider transferring the structure information from large networks to small ones for dense prediction tasks. Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance. Here we propose to distill structured knowledge from large networks to small networks, taking into account the fact that dense prediction is a structured prediction problem. Specifically, we study two structured distillation schemes: i)pair-wise distillation that distills the pairwise similarities by building a static graph, and ii)holistic distillation that uses adversarial training to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by extensive experiments on three dense prediction tasks: semantic segmentation, depth estimation, and object detection.
翻译:在本文中,我们考虑将结构信息从大型网络转移到小型网络,用于密集的预测任务。以前用于密集预测任务的知识蒸馏战略常常直接借用图像分类的蒸馏计划,对每种像素分别进行知识蒸馏,导致亚最佳性能。我们在这里建议将大型网络的结构化知识蒸馏到小型网络,同时考虑到密集预测是一个结构化的预测问题。具体地说,我们研究了两个结构化的蒸馏计划:一) 明智的蒸馏计划,通过建立静态图表来蒸馏对等相似之处,二) 利用对抗性培训来蒸馏整体知识的混合蒸馏计划。我们的知识蒸馏方法的有效性通过对三种密集预测任务进行的广泛实验得到证明:语义分割、深度估计和物体探测。