探索检索模型中的性别偏见 (Exploring Gender Bias in Retrieval Models) - 专知论文

会员服务 ·

0

Performer · 有偏 · MoDELS · IR · INFORMS ·

2022 年 8 月 2 日

Exploring Gender Bias in Retrieval Models

翻译：探索检索模型中的性别偏见

Dhanasekar Sundararaman,Vivek Subramanian

Biases in culture, gender, ethnicity, etc. have existed for decades and have affected many areas of human social interaction. These biases have been shown to impact machine learning (ML) models, and for natural language processing (NLP), this can have severe consequences for downstream tasks. Mitigating gender bias in information retrieval (IR) is important to avoid propagating stereotypes. In this work, we employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document, in which pronouns are replaced by male, female, and neutral conjugations. We definitively show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed and that lightweight fine-tuning performed with adapter networks improves zero-shot retrieval performance almost by 20% over baseline. We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female. We overcome this by introducing a debiasing technique that penalizes the model when it prefers males over females, resulting in an effective model that retrieves articles in a balanced fashion across genders.

翻译：文化、性别、种族等方面的偏见已经存在了几十年,并影响到人类社会互动的许多领域。这些偏见已经证明会影响机器学习模式(ML)和自然语言处理(NLP),这可能对下游任务产生严重后果。在信息检索(IR)中减少性别偏见对于避免传播陈规定型观念十分重要。在这项工作中,我们使用由两个组成部分组成的数据集:(1)文件与文件查询的相关性和(2)文件的“性别”,其中代名词往往由男性、女性和中性同族取而代之。我们明确表明,在对受过训练的大型BERT编码器进行全面微调时,IR的预培训模式在零速检索任务中表现不佳,而通过调整网络进行的轻量微调整则比基线提高了近20%的零光检索性业绩。我们还说明,经过培训的模型存在性别偏差,使得检索的物品往往由男性而不是女性取代。我们克服了这种偏差技术,即当它偏重男性而不是女性时,当它喜欢在女性之间找到一种平衡的模型时,它会有效地惩罚这种模式。

0

相关内容

Performer

【ACL2022】理解知识库嵌入中的性别偏见,Understanding Gender Bias in Knowledge Base Embeddings

【ACL2022】理解知识库嵌入中的性别偏见,Understanding Gender Bias in Knowledge Base Embeddings

专知会员服务

10+阅读 · 2022年3月24日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Tisp40在肾缺血再灌注损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

利用Cul4b基因敲除小鼠模型研究CUL4B在脂肪分化中的作用及其分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

环境雌激素多氯联苯（PCBs）影响甲状腺干细胞增殖与分化的研究

国家自然科学基金

0+阅读 · 2012年12月31日

肝移植胆道周围血管丛缺血性损伤中的MAC作用机制及对缺血型胆道病变的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

脂肪组织中p53与NF-κB失衡在高脂诱导胰岛素抵抗过程中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Arxiv

0+阅读 · 2022年9月30日

Retrieval-based Controllable Molecule Generation

Arxiv

0+阅读 · 2022年9月30日

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Arxiv

0+阅读 · 2022年9月30日

Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing

Arxiv

0+阅读 · 2022年9月29日

Multi-stage Information Retrieval for Vietnamese Legal Texts

Arxiv

0+阅读 · 2022年9月29日

A Survey on Ensemble Learning under the Era of Deep Learning

Arxiv

0+阅读 · 2022年9月28日

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text

Arxiv

0+阅读 · 2022年9月28日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相关VIP内容

【ACL2022】理解知识库嵌入中的性别偏见,Understanding Gender Bias in Knowledge Base Embeddings

【ACL2022】理解知识库嵌入中的性别偏见,Understanding Gender Bias in Knowledge Base Embeddings

专知会员服务

10+阅读 · 2022年3月24日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Arxiv

0+阅读 · 2022年9月30日

Retrieval-based Controllable Molecule Generation

Arxiv

0+阅读 · 2022年9月30日

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Arxiv

0+阅读 · 2022年9月30日

Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing

Arxiv

0+阅读 · 2022年9月29日

Multi-stage Information Retrieval for Vietnamese Legal Texts

Arxiv

0+阅读 · 2022年9月29日

A Survey on Ensemble Learning under the Era of Deep Learning

Arxiv

0+阅读 · 2022年9月28日

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text

Arxiv

0+阅读 · 2022年9月28日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

相关基金

Tisp40在肾缺血再灌注损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

利用Cul4b基因敲除小鼠模型研究CUL4B在脂肪分化中的作用及其分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

环境雌激素多氯联苯（PCBs）影响甲状腺干细胞增殖与分化的研究

国家自然科学基金

0+阅读 · 2012年12月31日

肝移植胆道周围血管丛缺血性损伤中的MAC作用机制及对缺血型胆道病变的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

脂肪组织中p53与NF-κB失衡在高脂诱导胰岛素抵抗过程中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员