八花世界:报告偏见如何影响语言模型对颜色的认识 (The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color) - 专知论文

会员服务 ·

0

Color · 语言模型化 · 多峰值 · 有偏 · MoDELS ·

2021 年 10 月 15 日

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

翻译：八花世界:报告偏见如何影响语言模型对颜色的认识

Cory Paik,Stéphane Aroca-Ouellette,Alessandro Roncone,Katharina Kann

from arxiv, Accepted to EMNLP 2021, 9 Pages

Recent work has raised concerns about the inherent limitations of text-only pretraining. In this paper, we first demonstrate that reporting bias, the tendency of people to not state the obvious, is one of the causes of this limitation, and then investigate to what extent multimodal training can mitigate this issue. To accomplish this, we 1) generate the Color Dataset (CoDa), a dataset of human-perceived color distributions for 521 common objects; 2) use CoDa to analyze and compare the color distribution found in text, the distribution captured by language models, and a human's perception of color; and 3) investigate the performance differences between text-only and multimodal models on CoDa. Our results show that the distribution of colors that a language model recovers correlates more strongly with the inaccurate distribution found in text than with the ground-truth, supporting the claim that reporting bias negatively impacts and inherently limits text-only training. We then demonstrate that multimodal models can leverage their visual training to mitigate these effects, providing a promising avenue for future research.

翻译：最近的工作引起了人们对仅限文本的预培训内在局限性的关切。在本文中,我们首先表明报告偏差、人们倾向于不说明显而易见的倾向是这一限制的原因之一,然后调查多式联运培训在多大程度上可以缓解这一问题。为了实现这一目标,我们1 生成了颜色数据集(CoDa),这是521个普通物体的人类感知色分布数据集;2 使用CoDa来分析和比较文本中的颜色分布、语言模型所捕捉的分布以及人类对颜色的看法;以及3 调查CoDa只文本模式和多式联运模式的性能差异。我们的结果显示,语言模型恢复的颜色分布与文本中发现的不准确分布比与地面图案的不准确分布更紧密相关,支持关于报告偏差的负面影响和仅限文本培训的说法。然后我们证明,多式联运模式能够利用其视觉培训来减轻这些影响,为未来研究提供有希望的渠道。

0

相关内容

Color

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

46+阅读 · 2020年5月17日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

73+阅读 · 2020年4月13日

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

专知会员服务

16+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

30+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

已删除

将门创投

7+阅读 · 2018年12月12日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

8+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Does Summary Evaluation Survive Translation to Other Languages?

Arxiv

0+阅读 · 2021年12月8日

Invisible Data Curation Practices: A Case Study from Facility Management

Arxiv

0+阅读 · 2021年11月5日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Arxiv

7+阅读 · 2021年6月16日

GeomCA: Geometric Evaluation of Data Representations

GeomCA: Geometric Evaluation of Data Representations

Arxiv

11+阅读 · 2021年5月26日

Directional Bias Amplification

Arxiv

3+阅读 · 2021年2月24日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Arxiv

5+阅读 · 2018年7月23日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

46+阅读 · 2020年5月17日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

73+阅读 · 2020年4月13日

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

专知会员服务

16+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

30+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

热门VIP内容

相关资讯

已删除

将门创投

7+阅读 · 2018年12月12日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

8+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Does Summary Evaluation Survive Translation to Other Languages?

Arxiv

0+阅读 · 2021年12月8日

Invisible Data Curation Practices: A Case Study from Facility Management

Arxiv

0+阅读 · 2021年11月5日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Arxiv

7+阅读 · 2021年6月16日

GeomCA: Geometric Evaluation of Data Representations

GeomCA: Geometric Evaluation of Data Representations

Arxiv

11+阅读 · 2021年5月26日

Directional Bias Amplification

Arxiv

3+阅读 · 2021年2月24日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Arxiv

5+阅读 · 2018年7月23日

微信扫码咨询专知VIP会员