解释NLP: " 性别比案件 " 的因果调解分析 (Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias) - 专知论文

会员服务 ·

0

有偏 · MoDELS · CASE · INFORMS · NLP ·

2020 年 11 月 22 日

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

翻译：解释NLP: " 性别比案件 " 的因果调解分析

Jesse Vig,Sebastian Gehrmann,Yonatan Belinkov,Sharon Qian,Daniel Nevo,Simas Sakenis,Jason Huang,Yaron Singer,Stuart Shieber

from arxiv, Expanded version

Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. We apply this methodology to analyze gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii) decomposable into effects flowing directly from the input and indirectly through the mediators.

翻译：在自然语言处理过程中解释神经模型的共同方法通常会检查其结构或行为,但并非两者兼而有之。我们提出一种基于因果调解分析理论的方法,用于解释一个模型的哪些部分与行为有因果关系。它使我们能够分析信息通过各种模型组成部分(称为调解人)从投入到产出的流通机制。我们运用这一方法分析培训前变异语言模型中的性别偏见。我们研究个别神经元和注意力主管在三个数据集中调解性别偏见方面的作用,这三个数据集旨在衡量模型对性别偏见的敏感性。我们的调解分析表明,性别偏见的影响(一) 稀少,集中在网络的一小部分;(二) 协同、放大或压制,由不同组成部分组成;(三) 分解成直接通过输入和间接通过调解人产生的效果。

0

相关内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

已删除

将门创投

5+阅读 · 2019年10月29日

What Affects Team Behavior? Preliminary Linguistic Analysis of Communications in the Jazz Repository

Arxiv

0+阅读 · 2021年1月11日

A negotiating protocol for group decision support systems

Arxiv

0+阅读 · 2021年1月10日

A Tale of Fairness Revisited: Beyond Adversarial Learning for Deep Neural Network Fairness

Arxiv

0+阅读 · 2021年1月8日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Arxiv

11+阅读 · 2020年5月8日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Context-Aware Neural Machine Translation Learns Anaphora Resolution

Arxiv

3+阅读 · 2018年5月25日

A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

Arxiv

6+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

已删除

将门创投

5+阅读 · 2019年10月29日

相关论文

What Affects Team Behavior? Preliminary Linguistic Analysis of Communications in the Jazz Repository

Arxiv

0+阅读 · 2021年1月11日

A negotiating protocol for group decision support systems

Arxiv

0+阅读 · 2021年1月10日

A Tale of Fairness Revisited: Beyond Adversarial Learning for Deep Neural Network Fairness

Arxiv

0+阅读 · 2021年1月8日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Arxiv

11+阅读 · 2020年5月8日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Context-Aware Neural Machine Translation Learns Anaphora Resolution

Arxiv

3+阅读 · 2018年5月25日

A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

Arxiv

6+阅读 · 2018年3月29日

微信扫码咨询专知VIP会员