利用多模式代表的视觉语言知识进行视觉关系探测 (Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations) - 专知论文

会员服务 ·

0

多峰值 · 注意力机制 · INFORMS · Performer · 目标检测 ·

2021 年 4 月 5 日

Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations

翻译：利用多模式代表的视觉语言知识进行视觉关系探测

Meng-Jiun Chiou,Roger Zimmermann,Jiashi Feng

from arxiv, Published in IEEE Access

Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years. Inspired by human reasoning mechanisms, it is believed that external visual commonsense knowledge is beneficial for reasoning visual relationships of objects in images, which is however rarely considered in existing methods. In this paper, we propose a novel approach named Relational Visual-Linguistic Bidirectional Encoder Representations from Transformers (RVL-BERT), which performs relational reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training with multimodal representations. RVL-BERT also uses an effective spatial module and a novel mask attention module to explicitly capture spatial information among the objects. Moreover, our model decouples object detection from visual relationship recognition by taking in object names directly, enabling it to be used on top of any object detection system. We show through quantitative and qualitative experiments that, with the transferred knowledge and novel modules, RVL-BERT achieves competitive results on two challenging visual relationship detection datasets. The source code is available at https://github.com/coldmanck/RVL-BERT.

翻译：在人类推理机制的启发下,人们认为外部视觉常识知识有助于对图像中物体的视觉关系进行推理,但在现有方法中却很少考虑到这一点。在本文中,我们提议了一种新颖的方法,名为“变异器的视觉和语言双向双向编码显示(RVL-BERT)”,该方法与通过多式演示的自我监督前培训所学的视觉和语言常识进行关联推理。 RVL-BERT还使用有效的空间模块和新的掩码注意模块明确捕捉物体之间的空间信息。此外,我们的模型脱色物体探测从视觉关系识别中直接取出物体名称,使其能够在任何物体探测系统之上使用。我们通过定量和定性实验显示,通过转让的知识和新模块,RVL-BERT在两个具有挑战性的视觉关系探测数据集上取得了竞争性的结果。源代码可在https://githbub.com/crownRcrcrcock/RngLVSet上查阅。

0

相关内容

多峰值

【ECCV2020-牛津大学】基于自监督学习的视频音视觉物体结构化

【ECCV2020-牛津大学】基于自监督学习的视频音视觉物体结构化

专知会员服务

19+阅读 · 2020年8月11日

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

专知会员服务

58+阅读 · 2020年6月30日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

25+阅读 · 2020年5月6日

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

专知会员服务

49+阅读 · 2020年3月30日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

38+阅读 · 2020年3月19日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

25+阅读 · 2020年3月16日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

10+阅读 · 2020年1月7日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

53+阅读 · 2019年12月22日

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

专知会员服务

25+阅读 · 2019年11月18日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

专知

8+阅读 · 2018年2月5日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Knowledge Representation Learning: A Quantitative Review

Knowledge Representation Learning: A Quantitative Review

Arxiv

27+阅读 · 2018年12月28日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Representation Learning for Visual-Relational Knowledge Graphs

Arxiv

9+阅读 · 2018年3月31日

Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Arxiv

9+阅读 · 2018年3月13日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Arxiv

3+阅读 · 2017年8月3日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【ECCV2020-牛津大学】基于自监督学习的视频音视觉物体结构化

【ECCV2020-牛津大学】基于自监督学习的视频音视觉物体结构化

专知会员服务

19+阅读 · 2020年8月11日

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

专知会员服务

58+阅读 · 2020年6月30日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

25+阅读 · 2020年5月6日

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

专知会员服务

49+阅读 · 2020年3月30日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

38+阅读 · 2020年3月19日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

25+阅读 · 2020年3月16日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

10+阅读 · 2020年1月7日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

53+阅读 · 2019年12月22日

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

专知会员服务

25+阅读 · 2019年11月18日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

【论文推荐】最新6篇目标检测（Object Detection）相关论文—物体链接、手机端、三维地图、航空图像、检测与姿态估计

专知

8+阅读 · 2018年2月5日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

相关论文

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Knowledge Representation Learning: A Quantitative Review

Knowledge Representation Learning: A Quantitative Review

Arxiv

27+阅读 · 2018年12月28日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Representation Learning for Visual-Relational Knowledge Graphs

Arxiv

9+阅读 · 2018年3月31日

Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Arxiv

9+阅读 · 2018年3月13日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Arxiv

3+阅读 · 2017年8月3日

微信扫码咨询专知VIP会员