基于视觉提示的语言模型用于开放世界中细粒度场景图生成 (Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 图 · Performer · Boosting（一种模型训练加速方式） ·

2023 年 3 月 23 日

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

翻译：基于视觉提示的语言模型用于开放世界中细粒度场景图生成

Qifan Yu,Juncheng Li,Yu Wu,Siliang Tang,Wei Ji,Yueting Zhuang

from arxiv, 21 pages, 16 figures

Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly to train and hard to distinguish due to a small amount of annotated data compared to frequent predicates. Existing re-balancing strategies try to haddle it via prior rules but are still confined to pre-defined conditions, which are not scalable for various models and datasets. In this paper, we propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates in a low-resource way. The proposed CaCao can be applied in a plug-and-play fashion and automatically strengthen existing SGG to tackle the long-tailed problem. Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner. Comprehensive experiments on three benchmark datasets show that CaCao consistently boosts the performance of multiple scene graph generation models in a model-agnostic way. Moreover, our Epic achieves competitive performance on open-world predicate prediction.

翻译：场景图生成(SGG)旨在从图像中提取<subject, predicate, object>关系以进行视觉理解。虽然最近的研究在SGG上取得了稳定的进展，但它们仍然遭受长尾分布问题，即尾部谓词比较难以区分，由于与频繁谓词相比具有少量的标注数据，因此更难训练。现有的重新平衡策略尝试通过先前的规则来解决这个问题，但仍局限于预定义条件，这对于各种模型和数据集来说不具有可扩展性。在本文中，我们提出了一个跨模态谓词增强（CaCao）框架，其中学习了一个视觉提示的语言模型，以低资源方式生成多样化的细粒度谓词。提出的CaCao可以以即插即用的方式应用，并自动加强现有的SGG以解决长尾问题。基于此，我们进一步引入了一种新颖的交织跨模态提示方法用于开放世界谓词场景图生成（Epic），其中模型可以以零样本的方式推广到看不见的谓词上。对三个基准数据集进行的全面实验显示，CaCao始终以一种与模型无关的方式提高了多种场景图生成模型的性能。此外，我们的Epic在开放世界谓词预测方面实现了有竞争力的表现。

0

相关内容

语言模型化

语言模型化

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【AAAI2022】在场景文本识别中，视觉语义学可以更好地进行文本推理

【AAAI2022】在场景文本识别中，视觉语义学可以更好地进行文本推理

专知会员服务

17+阅读 · 2022年2月7日

【EMNLP2021】标签推理的细粒度实体识别

专知会员服务

26+阅读 · 2021年9月19日

多样性文本生成任务的研究进展

专知会员服务

43+阅读 · 2021年4月23日

【CVPR2021】用于目标检测的通用实例蒸馏

【CVPR2021】用于目标检测的通用实例蒸馏

专知会员服务

24+阅读 · 2021年3月22日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

7 Papers & Radios | 首个用于工业开发的自动代码生成系统；多模态图像合成与编辑综述

7 Papers & Radios | 首个用于工业开发的自动代码生成系统；多模态图像合成与编辑综述

机器之心

0+阅读 · 2022年8月28日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

点击反应制备基于介孔氧化硅的二维分子印迹材料

国家自然科学基金

0+阅读 · 2014年12月31日

格基密钥的高效提取及格上身份基密码的新型设计

国家自然科学基金

0+阅读 · 2013年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

高分辨率极化SAR图像场景分类研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

大型语义辞典的自动生成及在文本分析中的应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

句法制导的统计汉语句义分析方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Arxiv

0+阅读 · 2023年5月15日

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Arxiv

0+阅读 · 2023年5月15日

Text Classification via Large Language Models

Arxiv

0+阅读 · 2023年5月15日

Enhancing Vascular Analysis with Distance Visualizations: An Overview and Implementation

Arxiv

0+阅读 · 2023年5月11日

Domain Incremental Lifelong Learning in an Open World

Arxiv

0+阅读 · 2023年5月11日

Privacy-Preserving Prompt Tuning for Large Language Model Services

Arxiv

0+阅读 · 2023年5月10日

DeepTextMark: Deep Learning based Text Watermarking for Detection of Large Language Model Generated Text

Arxiv

0+阅读 · 2023年5月9日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

VIP会员

文章信息

相关主题

语言模型化

Boosting（一种模型训练加速方式）

相关VIP内容

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【AAAI2022】在场景文本识别中，视觉语义学可以更好地进行文本推理

【AAAI2022】在场景文本识别中，视觉语义学可以更好地进行文本推理

专知会员服务

17+阅读 · 2022年2月7日

【EMNLP2021】标签推理的细粒度实体识别

专知会员服务

26+阅读 · 2021年9月19日

多样性文本生成任务的研究进展

专知会员服务

43+阅读 · 2021年4月23日

【CVPR2021】用于目标检测的通用实例蒸馏

【CVPR2021】用于目标检测的通用实例蒸馏

专知会员服务

24+阅读 · 2021年3月22日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《解析陆域作战方向：一个概念性框架》报告

《人工智能与人类的未来》2025年最新300页书籍

追寻真正的AI自主性：从遗留思维到战场优势

《“蛛网”行动：乌克兰不对称作战的演进》报告

相关资讯

7 Papers & Radios | 首个用于工业开发的自动代码生成系统；多模态图像合成与编辑综述

7 Papers & Radios | 首个用于工业开发的自动代码生成系统；多模态图像合成与编辑综述

机器之心

0+阅读 · 2022年8月28日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

相关论文

Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Arxiv

0+阅读 · 2023年5月15日

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Arxiv

0+阅读 · 2023年5月15日

Text Classification via Large Language Models

Arxiv

0+阅读 · 2023年5月15日

Enhancing Vascular Analysis with Distance Visualizations: An Overview and Implementation

Arxiv

0+阅读 · 2023年5月11日

Domain Incremental Lifelong Learning in an Open World

Arxiv

0+阅读 · 2023年5月11日

Privacy-Preserving Prompt Tuning for Large Language Model Services

Arxiv

0+阅读 · 2023年5月10日

DeepTextMark: Deep Learning based Text Watermarking for Detection of Large Language Model Generated Text

Arxiv

0+阅读 · 2023年5月9日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

相关基金

点击反应制备基于介孔氧化硅的二维分子印迹材料

国家自然科学基金

0+阅读 · 2014年12月31日

格基密钥的高效提取及格上身份基密码的新型设计

国家自然科学基金

0+阅读 · 2013年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

高分辨率极化SAR图像场景分类研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

大型语义辞典的自动生成及在文本分析中的应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

句法制导的统计汉语句义分析方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员