Bongard-HOI：基于人-物互动的小样本视觉推理基准测试 (Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions) - 专知论文

会员服务 ·

0

视觉推理 · 样本 · 基准测试 · 基准 · 小样本 ·

2023 年 4 月 13 日

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

翻译：Bongard-HOI：基于人-物互动的小样本视觉推理基准测试

Huaizu Jiang,Xiaojian Ma,Weili Nie,Zhiding Yu,Yuke Zhu,Song-Chun Zhu,Anima Anandkumar

from arxiv, CVPR 2022 (oral); First two authors contributed equally; Code: https://github.com/NVlabs/Bongard-HOI

A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts. We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning. We carefully curate the few-shot instances with hard negatives, where positive and negative images only disagree on action labels, making mere recognition of object categories insufficient to complete our benchmarks. We also design multiple test sets to systematically study the generalization of visual learning models, where we vary the overlap of the HOI concepts between the training and test sets of few-shot instances, from partial to no overlaps. Bongard-HOI presents a substantial challenge to today's visual recognition models. The state-of-the-art HOI detection model achieves only 62% accuracy on few-shot binary prediction while even amateur human testers on MTurk have 91% accuracy. With the Bongard-HOI benchmark, we hope to further advance research efforts in visual reasoning, especially in holistic perception-reasoning systems and better representation learning.

翻译：今天的视觉模式识别模型和人类视觉认知之间存在显著差距，特别是在小样本学习和组合推理方面。我们引入了一种新的视觉推理基准测试，名为Bongard-HOI，专注于从自然图像中组合学习人-物互动（HOI）。它受到了经典的Bongard问题（BPs）的两种理想特性的启发：1）少样本概念学习，2）上下文相关推理。我们通过难度极高的负样本精心地策划少样本实例，其中正负样本仅在动作标签上存在差异，仅识别物体类别是不足以完成我们的基准测试的。我们还设计了多个测试集，以系统地研究视觉学习模型的泛化能力，其中我们根据少样本实例的训练集和测试集之间的HOI概念重叠情况进行变化，从部分重叠到无重叠。Bongard-HOI对当今的视觉识别模型提出了重大挑战。最先进的HOI检测模型在少样本二值预测方面仅达到62％的准确率，而即使是MTurk上的业余人类测试者的准确率也为91％。通过Bongard-HOI基准测试，我们希望进一步推进视觉推理研究，特别是在整体感知推理系统和更好的表示学习方面。

0

相关内容

视觉推理

【ICCV2021】递阶变分神经不确定性模型的随机视频预测

专知会员服务

13+阅读 · 2021年10月9日

【CVPR2021教程】计算机视觉中的可解释机器学习

专知会员服务

61+阅读 · 2021年6月22日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

36+阅读 · 2020年3月27日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

25+阅读 · 2020年3月16日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

34+阅读 · 2020年3月12日

近期必读的8篇 AAAI 2020【图神经网络（GNN）】相关论文

近期必读的8篇 AAAI 2020【图神经网络（GNN）】相关论文

专知会员服务

76+阅读 · 2020年1月15日

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

专知会员服务

29+阅读 · 2020年1月10日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

91+阅读 · 2019年12月22日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

79+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

【论文】本体匹配实体对齐知识融合入门论文推荐

【论文】本体匹配实体对齐知识融合入门论文推荐

深度学习自然语言处理

25+阅读 · 2020年3月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

从蛋白质降解途径新视角探讨补肝养髓中药楮实子防治阿尔茨海默病的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

生物认知机制和特性启发的视觉计算模型与方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

跟踪器融合的视觉跟踪方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于蛙眼视觉模型的运动目标检测、跟踪及交通场景分析方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺血性脑损伤介导的ErbB4胞内结构域分解的分子机制及作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑心理行为因素的双边匹配决策理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

阿尔茨海默病分子流行病学：IL-18/IL-18Rα对IL-23/IL-17分泌轴的调控

国家自然科学基金

0+阅读 · 2012年12月31日

与Hardy算子相关的权函数的特征及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

交通视觉中鲁棒目标检测方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于事件的强化学习及其在群机器人优化控制中的应用

国家自然科学基金

3+阅读 · 2012年12月31日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

Towards Visual Saliency Explanations of Face Recognition

Arxiv

0+阅读 · 2023年5月30日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Multi-Task Learning for Visual Scene Understanding

Arxiv

27+阅读 · 2022年3月28日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

33+阅读 · 2019年10月24日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV2021】递阶变分神经不确定性模型的随机视频预测

专知会员服务

13+阅读 · 2021年10月9日

【CVPR2021教程】计算机视觉中的可解释机器学习

专知会员服务

61+阅读 · 2021年6月22日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

36+阅读 · 2020年3月27日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

25+阅读 · 2020年3月16日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

34+阅读 · 2020年3月12日

近期必读的8篇 AAAI 2020【图神经网络（GNN）】相关论文

近期必读的8篇 AAAI 2020【图神经网络（GNN）】相关论文

专知会员服务

76+阅读 · 2020年1月15日

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

专知会员服务

29+阅读 · 2020年1月10日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

91+阅读 · 2019年12月22日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

79+阅读 · 2019年10月9日

热门VIP内容

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

【论文】本体匹配实体对齐知识融合入门论文推荐

【论文】本体匹配实体对齐知识融合入门论文推荐

深度学习自然语言处理

25+阅读 · 2020年3月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

相关论文

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

Towards Visual Saliency Explanations of Face Recognition

Arxiv

0+阅读 · 2023年5月30日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Multi-Task Learning for Visual Scene Understanding

Arxiv

27+阅读 · 2022年3月28日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

33+阅读 · 2019年10月24日

相关基金

从蛋白质降解途径新视角探讨补肝养髓中药楮实子防治阿尔茨海默病的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

生物认知机制和特性启发的视觉计算模型与方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

跟踪器融合的视觉跟踪方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于蛙眼视觉模型的运动目标检测、跟踪及交通场景分析方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺血性脑损伤介导的ErbB4胞内结构域分解的分子机制及作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑心理行为因素的双边匹配决策理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

阿尔茨海默病分子流行病学：IL-18/IL-18Rα对IL-23/IL-17分泌轴的调控

国家自然科学基金

0+阅读 · 2012年12月31日

与Hardy算子相关的权函数的特征及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

交通视觉中鲁棒目标检测方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于事件的强化学习及其在群机器人优化控制中的应用

国家自然科学基金

3+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员