Computer vision has established a foothold in the online fashion retail industry. Main product detection is a crucial step of vision-based fashion product feed parsing pipelines, focused in identifying the bounding boxes that contain the product being sold in the gallery of images of the product page. The current state-of-the-art approach does not leverage the relations between regions in the image, and treats images of the same product independently, therefore not fully exploiting visual and product contextual information. In this paper we propose a model that incorporates Graph Convolutional Networks (GCN) that jointly represent all detected bounding boxes in the gallery as nodes. We show that the proposed method is better than the state-of-the-art, especially, when we consider the scenario where title-input is missing at inference time and for cross-dataset evaluation, our method outperforms previous approaches by a large margin.
翻译:计算机愿景在网上时装零售业中确立了立足点。主要产品检测是基于愿景的时装产品供料剖析管道的关键一步,重点是确定产品页面图片库中销售的产品中的捆绑框。目前最先进的方法并不影响图像中各区域之间的关系,而是独立处理同一产品的图像,因此没有充分利用视觉和产品背景信息。在本文中,我们提出了一个模型,将图像革命网络(GCN)联合代表画廊中所有被检测到的捆绑框作为节点。我们表明,拟议方法比最新工艺要好,特别是当我们考虑在推断和交叉数据评估时缺少产权投入的情形时,我们的方法比以往的方法大宽处优。