视觉问题解答的不同关注 (Differential Attention for Visual Question Answering)

In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that the previous systems focus on are not correlated with the regions that humans focus on. The accuracy is limited due to this drawback. In this paper, we propose to solve this problem by using an exemplar based method. We obtain one or more supporting and opposing exemplars to obtain a differential attention region. This differential attention is closer to human attention than other image based attention methods. It also helps in obtaining improved accuracy when answering questions. The method is evaluated on challenging benchmark datasets. We perform better than other image based attention methods and are competitive with other state of the art methods that focus on both image and questions.

翻译：在本文中,我们的目标是在为一些图像提供一组问答数据集时,回答基于图像的问题。一些方法侧重于通过利用图像关注来解决这一问题。在回答问题时,重点是图像的一个特定部分。人类也这样做。然而,以前系统关注的区域与人类关注的区域没有关联。由于这一缺陷,准确性有限。我们在本文件中建议使用一个基于实例的方法来解决这一问题。我们获得了一个或多个支持和反对示例,以获得不同的关注区域。这种关注比基于图像的其他方法更接近于人类关注区域。在回答问题时,还有助于提高准确性。该方法以具有挑战性的基准数据集来评估。我们比其他基于关注的方法要好,并且与其他侧重于图像和问题的艺术方法的状态相比,我们表现得更好,并且具有竞争力。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

专知会员服务

71+阅读 · 2019年12月31日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日