Searching by image is popular yet still challenging due to the extensive interference arose from i) data variations (e.g., background, pose, visual angle, brightness) of real-world captured images and ii) similar images in the query dataset. This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks. We broadly extract different types of image features, and raise an intriguing question that whether these features are beneficial to i) suppress data variations of real-world captured images, and ii) distinguish one image from others which look very similar but are intrinsically different beauty products in the dataset, therefore leading to an enhanced capability of BPR. To answer it, we present a novel variable-attention neural network to understand the combination of multiple features (termed VM-Net) of beauty product images. Considering that there are few publicly released training datasets for BPR, we establish a new dataset with more than one million images classified into more than 20K categories to improve both the generalization and anti-interference abilities of VM-Net and other methods. We verify the performance of VM-Net and its competitors on the benchmark dataset Perfect-500K, where VM-Net shows clear improvements over the competitors in terms of MAP@7. The source code and dataset will be released upon publication.
翻译:搜索图像在实际应用中受到广泛关注,但由于现实世界拍摄图像的数据差异(例如背景、姿势、视角、亮度)和查询数据集中相似图像的干扰,仍具有挑战性。本文研究了利用神经网络进行美容产品检索(BPR)的实际意义问题。我们广泛提取了不同类型的图像特征,并提出了一个有趣的问题,即这些特征是否有益于 i)抑制现实世界拍摄图像的数据差异,和 ii)区分外观非常相似但是本质上是不同美容产品的图像集,从而提高 BPR 的能力。为了回答这个问题,我们提出了一种新型的可变注意力神经网络来理解美容产品图像的多个特征的组合(称为 VM-Net)。考虑到 BPR 的公共发布的训练数据集很少,我们建立了一个新数据集,其中有超过一百万张图像被分类到 20K 个类别中,以改善 VM-Net 和其他方法的泛化和抗干扰能力。我们在基准数据集 Perfect-500K 上验证了 VM-Net 及其竞争对手的性能,其中 VM-Net 在 MAP@7 方面对竞争对手显示出明显的改进。代码和数据集将在发表后发布。