Machine learning models automatically learn discriminative features from the data, and are therefore susceptible to learn strongly-correlated biases, such as using protected attributes like gender and race. Most existing bias mitigation approaches aim to explicitly reduce the model's focus on these protected features. In this work, we propose to mitigate bias by explicitly guiding the model's focus towards task-relevant features using domain knowledge, and we hypothesize that this can indirectly reduce the dependence of the model on spurious correlations it learns from the data. We explore bias mitigation in facial expression recognition systems using facial Action Units (AUs) as the task-relevant feature. To this end, we introduce Feature-based Positive Matching Contrastive Loss which learns the distances between the positives of a sample based on the similarity between their corresponding AU embeddings. We compare our approach with representative baselines and show that incorporating task-relevant features via our method can improve model fairness at minimal cost to classification performance.
翻译:机器学习模型自动从数据中学习歧视性特征,因此很容易学习强烈相关的偏见,例如使用性别和种族等受保护属性。大多数现有的减少偏见方法旨在明确减少模型对这些受保护特征的侧重。在这项工作中,我们提议通过明确指导模型的侧重点,利用域知识,将任务相关特征与任务相关特征联系起来,来减少偏见,我们假设这可以间接减少模型对从数据中学习的虚假相关性的依赖。我们用面部行动股(AUs)作为任务相关特征,探讨面部表达识别系统中的偏见。为此,我们引入基于特征的正面匹配损失,根据样本中对应的非盟嵌入的相似性,学习样本中的正数之间的距离。我们将我们的方法与有代表性的基线进行比较,并表明通过我们的方法纳入任务相关特征能够以最低的成本提高模型的公正性。</s>