Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods usually require laborious object-level annotations (i.e., object labels and bounding boxes) for effective learning of the object-level visual features. In this paper, we propose a novel and efficient deep framework to boost multi-label classification by distilling knowledge from weakly-supervised detection task without bounding box annotations. Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs. The WSD model is the teacher model and the classification model is the student model. After this cross-task knowledge distillation, the performance of the classification model is significantly improved and the efficiency is maintained since the WSD model can be safely discarded in the test phase. Extensive experiments on two large-scale datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior performances over the state-of-the-art methods on both performance and efficiency.
翻译:多标签图像分类是一项基本但具有挑战性的任务,有助于一般视觉理解。现有方法发现区域级提示(如RoIs的特征)可以促进多标签分类。然而,这些方法通常需要艰苦的物体级说明(如对象标签和捆绑框框),以便有效学习对象级视觉特征。在本文件中,我们提议了一个新颖而高效的深层次框架,通过从不设框框说明的薄弱监督检测任务中提取知识,促进多标签分类。具体地说,鉴于图像级说明,(1) 我们首先开发了一个弱于监督的检测(SWD)模型,然后(2) 构建一个端到端多标签的图像分类框架,通过知识蒸馏模块,指导WSD模型的分类模型,根据整个图像的等级预测和对象级的视觉特征,促进多标签分类。WSD模式是教师模型,分类模式是学生模型。在此图像级知识蒸馏后,我们分类模型的性能将大大改进,而NSD-CO模型的性能测试框架将大大改进,在W的大规模测试阶段中,将安全地维持在WSDSD-SD-SD-SDSDS-S-S-S-S-S-S-SD-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-