ClawCraneNet: 将基于文本的视频分割的对象级关系杠杆化 (ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation) - 专知论文

会员服务 ·

0

INTERACT · Extensibility · Guidance · 语言表示 · ConvNets ·

2021 年 6 月 5 日

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

翻译：ClawCraneNet: 将基于文本的视频分割的对象级关系杠杆化

Chen Liang,Yu Wu,Yawei Luo,Yi Yang

Text-based video segmentation is a challenging task that segments out the natural language referred objects in videos. It essentially requires semantic comprehension and fine-grained video understanding. Existing methods introduce language representation into segmentation models in a bottom-up manner, which merely conducts vision-language interaction within local receptive fields of ConvNets. We argue that such interaction is not fulfilled since the model can barely construct region-level relationships given partial observations, which is contrary to the description logic of natural language/referring expressions. In fact, people usually describe a target object using relations with other objects, which may not be easily understood without seeing the whole video. To address the issue, we introduce a novel top-down approach by imitating how we human segment an object with the language guidance. We first figure out all candidate objects in videos and then choose the refereed one by parsing relations among those high-level objects. Three kinds of object-level relations are investigated for precise relationship understanding, i.e., positional relation, text-guided semantic relation, and temporal relation. Extensive experiments on A2D Sentences and J-HMDB Sentences show our method outperforms state-of-the-art methods by a large margin. Qualitative results also show our results are more explainable. Besides, based on the inspiration, we win the first place in CVPR2021 Referring Youtube-VOS challenge.

翻译：以文字为基础的视频分割是一项具有挑战性的任务,它将自然语言在视频中指向对象。它基本上需要语义理解和精细的视频理解。现有的方法将语言代表引入自下而上的方式将语言代表引入分割模式, 仅仅在ConvNets 的当地可接受域内进行视觉语言互动。我们争辩说, 这种互动没有实现, 因为根据部分观察, 模型几乎无法构建区域层面的关系, 这与自然语言/ 引用表达表达方式的描述逻辑相悖。事实上, 人们通常使用与其他对象的关系描述目标对象, 不看整个视频可能很容易理解这些对象。为了解决这个问题, 我们引入了一种新的自上而下的方法, 模仿我们如何以语言指导的方式将人类部分作为对象。我们首先在视频中找出所有候选对象, 然后通过区分这些高层次对象之间的关系来选择被引用的。三种目标级别关系被调查为精确的关系理解, 例如, 定位关系, 文本引导的语义关系, 以及时间关系。在 A2D 句和 J- HMDB 版本中进行广泛的实验, 展示我们基于 C 的双曲线的双曲线的结果。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

语义分割 | context relation

语义分割 | context relation

极市平台

8+阅读 · 2019年2月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

论文浅尝 | EARL: Joint Entity and Relation Linking for QA over KG

论文浅尝 | EARL: Joint Entity and Relation Linking for QA over KG

开放知识图谱

6+阅读 · 2018年10月30日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Relation Networks for Object Detection 论文笔记

Relation Networks for Object Detection 论文笔记

统计学习与视觉计算组

16+阅读 · 2018年4月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Full-Duplex Strategy for Video Object Segmentation

Full-Duplex Strategy for Video Object Segmentation

Arxiv

0+阅读 · 2021年8月6日

Knowledge-based Review Generation by Coherence Enhanced Text Planning

Arxiv

7+阅读 · 2021年5月9日

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Arxiv

3+阅读 · 2021年4月9日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Semantic Grouping Network for Video Captioning

Arxiv

8+阅读 · 2021年2月1日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

语义分割 | context relation

语义分割 | context relation

极市平台

8+阅读 · 2019年2月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

论文浅尝 | EARL: Joint Entity and Relation Linking for QA over KG

论文浅尝 | EARL: Joint Entity and Relation Linking for QA over KG

开放知识图谱

6+阅读 · 2018年10月30日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Relation Networks for Object Detection 论文笔记

Relation Networks for Object Detection 论文笔记

统计学习与视觉计算组

16+阅读 · 2018年4月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Full-Duplex Strategy for Video Object Segmentation

Full-Duplex Strategy for Video Object Segmentation

Arxiv

0+阅读 · 2021年8月6日

Knowledge-based Review Generation by Coherence Enhanced Text Planning

Arxiv

7+阅读 · 2021年5月9日

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Arxiv

3+阅读 · 2021年4月9日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Semantic Grouping Network for Video Captioning

Arxiv

8+阅读 · 2021年2月1日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

微信扫码咨询专知VIP会员