The rise of concern around Natural Language Processing (NLP) technologies containing and perpetuating social biases has led to a rich and rapidly growing area of research. Gender bias is one of the central biases being analyzed, but to date there is no comprehensive analysis of how "gender" is theorized in the field. We survey nearly 200 articles concerning gender bias in NLP to discover how the field conceptualizes gender both explicitly (e.g. through definitions of terms) and implicitly (e.g. through how gender is operationalized in practice). In order to get a better idea of emerging trajectories of thought, we split these articles into two sections by time. We find that the majority of the articles do not make their theorization of gender explicit, even if they clearly define "bias." Almost none use a model of gender that is intersectional or inclusive of nonbinary genders; and many conflate sex characteristics, social gender, and linguistic gender in ways that disregard the existence and experience of trans, nonbinary, and intersex people. There is an increase between the two time-sections in statements acknowledging that gender is a complicated reality, however, very few articles manage to put this acknowledgment into practice. In addition to analyzing these findings, we provide specific recommendations to facilitate interdisciplinary work, and to incorporate theory and methodology from Gender Studies. Our hope is that this will produce more inclusive gender bias research in NLP.
翻译:对包含和延续社会偏见的自然语言处理(NLP)技术的关注增加,导致研究领域日益丰富和迅速增长,性别偏见是正在分析的中心偏向之一,但迄今为止还没有对该领域的“性别”理论进行综合分析。我们调查了全国语言处理(NLP)中近200篇关于性别偏见的文章,以了解该领域如何明确(例如通过术语定义)和隐含(例如通过在实践中如何运用性别概念)将性别概念化;为了更好地了解新出现的思想轨迹,我们将这两篇文章分成两个部分。我们发现,大多数文章没有将性别概念化明确,即使它们明确界定了“性别”理论。我们几乎没有在《国家语言处理计划》中采用一个交叉或包含非双性性别的性别模式;许多将性别特征、社会性别以及语言性别概念混杂在一起,从而忽视了跨性别、非双性、跨性人和跨性人的存在和经验。我们发现,在承认性别是一个复杂的现实的两部分中增加了时间部分。我们发现,大部分文章没有将性别概念化研究纳入我们的具体方法。