视觉关系直线进化神经网络 (Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks)

The robust and efficient recognition of visual relations in images is a hallmark of biological vision. Here, we argue that, despite recent progress in visual recognition, modern machine vision algorithms are severely limited in their ability to learn visual relations. Through controlled experiments, we demonstrate that visual-relation problems strain convolutional neural networks (CNNs). The networks eventually break altogether when rote memorization becomes impossible such as when the intra-class variability exceeds their capacity. We further show that another type of feedforward network, called a relational network (RN), which was shown to successfully solve seemingly difficult visual question answering (VQA) problems on the CLEVR datasets, suffers similar limitations. Motivated by the comparable success of biological vision, we argue that feedback mechanisms including working memory and attention are the key computational components underlying abstract visual reasoning.

翻译：对图像中视觉关系的有力和有效认识是生物视觉的标志。在这里,我们争论说,尽管在视觉认知方面最近有所进步,现代机器视觉算法在学习视觉关系的能力方面受到严重限制。通过受控实验,我们证明视觉关系问题使进化神经网络(CNNs)紧张。当腐烂的记忆化变得不可能时,例如当阶级内部变异性超过其能力时,网络最终会完全崩溃。我们进一步表明,另一种称为“连接网络”的饲料前进网络(RN),它被证明成功地解决了CLEVR数据集中似乎困难的视觉回答(VQA)问题,受到类似的限制。由于生物视觉的类似成功,我们争论说,包括工作记忆和注意力在内的反馈机制是抽象视觉推理的关键计算组成部分。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

51+阅读 · 2020年5月26日

【DeepMind-牛津-CMU-CVPR2020】无监督文字翻译视频中的视觉基础，Visual Grounding in Video for Unsupervised Word Translation

专知会员服务

13+阅读 · 2020年3月12日

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

专知会员服务

87+阅读 · 2020年3月1日