Real-world datasets often encode stereotypes and societal biases. Such biases can be implicitly captured by trained models, leading to biased predictions and exacerbating existing societal preconceptions. Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias. However, a disconnect between fairness criteria and training objectives makes it difficult to reason theoretically about the effectiveness of different techniques. In this work, we propose two novel training objectives which directly optimise for the widely-used criterion of {\it equal opportunity}, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
翻译:现实世界的数据集往往将陈规定型观念和社会偏见编成法典,这些偏见可以通过经过培训的模式隐含地被吸收,从而导致有偏见的预测和加剧现有的社会先入为主的观念。现有的贬低性方法,如对抗性培训和将受保护的信息从陈述中剔除,已经证明减少了偏见。然而,公平标准与培训目标之间的脱节使得从理论上难以理解不同技术的有效性。在这项工作中,我们提出了两个新的培训目标,直接优化广泛使用的平等机会标准,并表明它们有效地减少了偏见,同时在两项分类任务上保持高绩效。