可靠的视觉问题回答:不正确的回答而不能回答 (Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly) - 专知论文

会员服务 ·

0

视觉问答 · 多峰值 · 自动问答 · MoDELS · 模型评估 ·

2022 年 7 月 27 日

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

翻译：可靠的视觉问题回答:不正确的回答而不能回答

Spencer Whitehead,Suzanne Petryk,Vedaad Shakib,Joseph Gonzalez,Trevor Darrell,Anna Rohrbach,Marcus Rohrbach

from arxiv, ECCV 2022. Code and models are available here: https://github.com/facebookresearch/reliable_vqa

Machine learning has advanced dramatically, narrowing the accuracy gap to humans in multimodal tasks like visual question answering (VQA). However, while humans can say "I don't know" when they are uncertain (i.e., abstain from answering a question), such ability has been largely neglected in multimodal research, despite the importance of this problem to the usage of VQA in real settings. In this work, we promote a problem formulation for reliable VQA, where we prefer abstention over providing an incorrect answer. We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion. For that, we explore several abstention approaches. We find that although the best performing models achieve over 71% accuracy on the VQA v2 dataset, introducing the option to abstain by directly using a model's softmax scores limits them to answering less than 8% of the questions to achieve a low risk of error (i.e., 1%). This motivates us to utilize a multimodal selection function to directly estimate the correctness of the predicted answers, which we show can increase the coverage by, for example, 2.4x from 6.8% to 16.3% at 1% risk. While it is important to analyze both coverage and risk, these metrics have a trade-off which makes comparing VQA models challenging. To address this, we also propose an Effective Reliability metric for VQA that places a larger cost on incorrect answers compared to abstentions. This new problem formulation, metric, and analysis for VQA provide the groundwork for building effective and reliable VQA models that have the self-awareness to abstain if and only if they don't know the answer.

翻译：机器学习进展显著,缩小了在视觉问答(VQA)等多式联运任务中对人类的准确性差距。然而,虽然人类在不确定时可以说“我不知道”(即不回答一个问题),但这种能力在多式联运研究中在很大程度上被忽略,尽管这个问题对在真实环境中使用VQA很重要。在这项工作中,我们提倡为可靠的VQA制定问题配方,我们更愿意不提供错误的答案。我们首先为若干VQA 模型提供弃权能力,并分析其覆盖范围、回答的问题部分和风险,以及这一部分的错误。为此,我们探索了几种弃权方法。我们发现,尽管最佳执行模式在VQA v2数据集上实现了71%的准确性,但引入了直接使用模型软通分数来回答不到8%的问题的选项,从而降低了错误风险(即1 % ) 。这促使我们利用一个可靠的多式联运选择功能来直接估计预测答案的正确性,即回答的问题部分和风险部分,我们用QA 来比较Q Q 质量的准确性范围,我们要用一个有效的指标来分析。

0

相关内容

视觉问答

视觉问答（Visual Question Answering，VQA），是一种涉及计算机视觉和自然语言处理的学习任务。这一任务的定义如下： A VQA system takes as input an image and a free-form, open-ended, natural-language question about the image and produces a natural-language answer as the output[1]。翻译为中文：一个VQA系统以一张图片和一个关于这张图片形式自由、开放式的自然语言问题作为输入，以生成一条自然语言答案作为输出。简单来说，VQA就是给定的图片进行问答。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

53+阅读 · 2020年1月30日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

159+阅读 · 2020年1月16日

【NLP| 推荐文章】知识图谱问答系统的神经网络方法介绍（Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs）

专知会员服务

57+阅读 · 2019年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

28+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

92+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

北京内推 | 微软亚洲互联网工程院S+D Science Team招聘NLP研究员/实习生

北京内推 | 微软亚洲互联网工程院S+D Science Team招聘NLP研究员/实习生

PaperWeekly

0+阅读 · 2022年2月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

25+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

受体相互作用蛋白1在紫外线诱导人皮肤成纤维细胞坏死性凋亡中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Delta-Sarcoglycan基因的两个新突变在东亚人遗传性心肌病中的致病作用及其机理

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Nrf2-ARE信号通路在氢气干预新生儿坏死性小肠结肠炎中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

Learned Force Fields Are Ready For Ground State Catalyst Discovery

Arxiv

0+阅读 · 2022年9月26日

A regression approach to the two-dataset problem

Arxiv

0+阅读 · 2022年9月26日

Multi-Task Learning for Visual Scene Understanding

Arxiv

28+阅读 · 2022年3月28日

How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Arxiv

15+阅读 · 2022年1月5日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN

Arxiv

11+阅读 · 2018年5月27日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

11+阅读 · 2018年2月15日

Multi-pseudo Regularized Label for Generated Samples in Person Re-Identification

Arxiv

12+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

53+阅读 · 2020年1月30日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

159+阅读 · 2020年1月16日

【NLP| 推荐文章】知识图谱问答系统的神经网络方法介绍（Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs）

专知会员服务

57+阅读 · 2019年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

28+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

92+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

北京内推 | 微软亚洲互联网工程院S+D Science Team招聘NLP研究员/实习生

北京内推 | 微软亚洲互联网工程院S+D Science Team招聘NLP研究员/实习生

PaperWeekly

0+阅读 · 2022年2月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

25+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Learned Force Fields Are Ready For Ground State Catalyst Discovery

Arxiv

0+阅读 · 2022年9月26日

A regression approach to the two-dataset problem

Arxiv

0+阅读 · 2022年9月26日

Multi-Task Learning for Visual Scene Understanding

Arxiv

28+阅读 · 2022年3月28日

How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Arxiv

15+阅读 · 2022年1月5日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN

Arxiv

11+阅读 · 2018年5月27日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

11+阅读 · 2018年2月15日

Multi-pseudo Regularized Label for Generated Samples in Person Re-Identification

Arxiv

12+阅读 · 2018年1月29日

相关基金

受体相互作用蛋白1在紫外线诱导人皮肤成纤维细胞坏死性凋亡中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Delta-Sarcoglycan基因的两个新突变在东亚人遗传性心肌病中的致病作用及其机理

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Nrf2-ARE信号通路在氢气干预新生儿坏死性小肠结肠炎中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员