语义感知双重视角信息融合的部分标记多标签图像识别 (Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels)

Despite achieving impressive progress, current multi-label image recognition (MLR) algorithms heavily depend on large-scale datasets with complete labels, making collecting large-scale datasets extremely time-consuming and labor-intensive. Training the multi-label image recognition models with partial labels (MLR-PL) is an alternative way, in which merely some labels are known while others are unknown for each image. However, current MLP-PL algorithms rely on pre-trained image similarity models or iteratively updating the image classification models to generate pseudo labels for the unknown labels. Thus, they depend on a certain amount of annotations and inevitably suffer from obvious performance drops, especially when the known label proportion is low. To address this dilemma, we propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images, from instance and prototype perspective respectively, to transfer information of known labels to complement unknown labels. Specifically, an instance-perspective representation blending (IPRB) module is designed to blend the representations of the known labels in an image with the representations of the corresponding unknown labels in another image to complement these unknown labels. Meanwhile, a prototype-perspective representation blending (PPRB) module is introduced to learn more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels, in a location-sensitive manner, to complement these unknown labels. Extensive experiments on the MS-COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed DSRB consistently outperforms current state-of-the-art algorithms on all known label proportion settings.

翻译：尽管在取得惊人进展的同时，当前的多标记图像识别（MLR）算法严重依赖于具有完整标签的大规模数据集，使得收集大规模数据集变得极其耗时和劳力密集。使用部分标签（MLR-PL）训练多标记图像识别模型是一种替代方法，其中仅一些标签已知，而其余标签对于每个图像来说未知。然而，目前的MLP-PL算法依赖于已预训练的图像相似性模型或迭代更新图像分类模型以为未知标签生成伪标签。因此，它们依赖于某些注释，并且不可避免地遭受明显的性能下降，特别是在已知标签比例低的情况下。为了解决这个困境，我们提出了一个双重视角语义感知表示融合（DSRB），它跨不同图像混合多层次类别特定的语义表示，从实例和原型角度，将已知标签的信息转移来补充未知标签。具体而言，设计了一个实例角度的表示融合（IPRB）模块，将图像中已知标签的表示与另一图像中相应的未知标签的表示混合以补充这些未知标签的表示。同时，引入了原型角度表示融合（PPRB）模块，为每个类别学习更稳定的表示原型，并以位置敏感的方式将未知标签的表示与相应标签的原型混合，以补充这些未知标签的表示。在MS-COCO、Visual Genome和Pascal VOC 2007数据集上进行的广泛实验表明，所提出的DSRB算法在所有已知标签比例设置下始终优于当前最先进的算法。