在使用以名称为基础的方法推断种族时避免偏见 (Avoiding bias when inferring race using name-based approaches) - 专知论文

会员服务 ·

0

INFORMS · 推断 · 有偏 · 欠估计 · 阈值 ·

2021 年 10 月 12 日

Avoiding bias when inferring race using name-based approaches

翻译：在使用以名称为基础的方法推断种族时避免偏见

Diego Kozlowski,Dakota S. Murray,Alexis Bell,Will Hulsey,Vincent Larivière,Thema Monroe-White,Cassidy R. Sugimoto

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors' race, few large scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about authors, such as their names, to infer their perceived race. As with any other algorithm, the process of racial inference can generate biases if it is not carefully considered. The goal of this article is to assess the extent to which algorithmic bias is introduced using different approaches for name based racial inference. We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article lays the foundation for more systematic and less biased investigations into racial disparities in science.

翻译：学术界的种族差异是一个广泛公认的问题。对基于种族的系统性种族不平等的定量理解是迈向更公平研究制度的一个重要步骤。然而,由于缺乏关于作者种族的可靠信息,因此,很少对这一专题进行大规模分析。分析方法提供了一种解决办法,利用关于作者的已知信息,例如他们的姓名,推断他们认为的种族。与任何其他算法一样,种族推论过程如果不仔细考虑,就会产生偏见。本文章的目的是评估算法偏见在多大程度上采用不同方法来进行基于种族的推论。我们利用美国人口普查和抵押申请中的信息来推断科学网中的美国附属作者的种族。我们估计了使用特定名称和家族名称、阈值或连续分布以及估算的影响。我们的结果表明,基于名称推论的有效性因种族/族裔特征而不同,而且这一阈值会低估黑人作者和高估白人作者。我们最后提出了避免潜在偏见的建议。这一文章为更系统和较少偏差地调查科学中的种族差距奠定了基础。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【干货书】计算机科学，647页pdf，Computer Science

【干货书】计算机科学，647页pdf，Computer Science

专知会员服务

46+阅读 · 2021年5月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Inference for ROC Curves Based on Estimated Predictive Indices

Arxiv

0+阅读 · 2021年12月3日

How to Staff when Customers Arrive in Batches

Arxiv

0+阅读 · 2021年12月3日

Change-point detection in the covariance kernel of functional data using data depth

Arxiv

0+阅读 · 2021年12月2日

On the Documentation of Refactoring Types

Arxiv

0+阅读 · 2021年12月2日

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

18+阅读 · 2021年10月5日

Directional Bias Amplification

Arxiv

3+阅读 · 2021年2月24日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Arxiv

3+阅读 · 2018年9月15日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】计算机科学，647页pdf，Computer Science

【干货书】计算机科学，647页pdf，Computer Science

专知会员服务

46+阅读 · 2021年5月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

无人机语义安全研究综述

【ICML2025】用于提升生成式口语语言模型自然度的变分框架

提示学习在计算机视觉中的分类、应用及展望

2025年专业服务领域生成式人工智能报告：为下一步战略应用GenAl做好准备

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Inference for ROC Curves Based on Estimated Predictive Indices

Arxiv

0+阅读 · 2021年12月3日

How to Staff when Customers Arrive in Batches

Arxiv

0+阅读 · 2021年12月3日

Change-point detection in the covariance kernel of functional data using data depth

Arxiv

0+阅读 · 2021年12月2日

On the Documentation of Refactoring Types

Arxiv

0+阅读 · 2021年12月2日

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

18+阅读 · 2021年10月5日

Directional Bias Amplification

Arxiv

3+阅读 · 2021年2月24日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Arxiv

3+阅读 · 2018年9月15日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

微信扫码咨询专知VIP会员