Recent research has revealed undesirable biases in NLP data and models. However, these efforts focus on social disparities in West, and are not directly portable to other geo-cultural contexts. In this paper, we focus on NLP fair-ness in the context of India. We start with a brief account of the prominent axes of social disparities in India. We build resources for fairness evaluation in the Indian context and use them to demonstrate prediction biases along some of the axes. We then delve deeper into social stereotypes for Region andReligion, demonstrating its prevalence in corpora and models. Finally, we outline a holistic research agenda to re-contextualize NLP fairness research for the Indian context, ac-counting for Indian societal context, bridging technological gaps in NLP capabilities and re-sources, and adapting to Indian cultural values. While we focus on India, this framework can be generalized to other geo-cultural contexts.
翻译:最近的研究揭示了NLP数据和模型中的不良偏差。然而,这些努力侧重于西方的社会差异,不能直接用于其他地理文化背景。在本文中,我们侧重于印度的NLP公平性。我们首先简要叙述印度社会差异的突出轴心。我们为印度的公平性评估积累了资源,并用这些资源来展示一些轴心的预测偏差。我们接着更深入地探讨地区和宗教的社会陈规定型观念,以展示其在公司和模型中的流行程度。最后,我们提出了一个整体研究议程,以重新将NLP公平性研究与印度环境重新联系起来,为印度社会背景进行计算,缩小NLP能力和再来源方面的技术差距,并适应印度的文化价值观。我们侧重于印度,这个框架可以推广到其他地理文化背景中。