Deep neural networks are being increasingly used in real world applications (e.g. surveillance, face recognition). This has resulted in concerns about the fairness of decisions made by these models. Various notions and measures of fairness have been proposed to ensure that a decision-making system does not disproportionately harm (or benefit) particular subgroups of population. In this paper, we argue that traditional notions of fairness that are only based on models' outputs are not sufficient when decision-making systems such as deep networks are vulnerable to adversarial attacks. We argue that in some cases, it may be easier for an attacker to target a particular subgroup, resulting in a form of \textit{robustness bias}. We propose a new notion of \textit{adversarial fairness} that requires all subgroups to be equally robust to adversarial perturbations. We show that state-of-the-art neural networks can exhibit robustness bias on real world datasets such as CIFAR10, CIFAR100, Adience, and UTKFace. We then formulate a measure of our proposed fairness notion and use it as a regularization term to decrease the robustness bias in the traditional empirical risk minimization objective. Through empirical evidence, we show that training with our proposed regularization term can partially mitigate adversarial unfairness while maintaining reasonable classification accuracy.
翻译:深度神经网络越来越多地被用于现实世界的应用(例如监视、面部识别),这引起了人们对这些模型所作决定的公正性的关切。提出了各种公平概念和措施,以确保决策系统不会对特定人口分组造成不成比例的伤害(或利益)。在本文件中,我们提出,只有基于模型产出的传统公平概念,在深层网络等决策系统易受敌对攻击时,这种仅仅基于模型产出的传统公平概念是不够的。我们争辩说,在某些情况下,攻击者可能更容易针对特定分组,从而形成一种形式的\textit{robustness偏差}。我们提出了一个新的“textit{对抗公平”概念和公平措施,要求所有分组都同样强于对立性干扰。我们表明,当诸如CIRA10、CIFAR100、Adience和UTKFace等决策系统容易受到敌对攻击时,目前状态的神经网络在现实世界数据集中表现出稳健的偏向性偏向性偏向性偏向性偏向性偏向,与此同时,我们提出一个正规化的术语,以降低我们所主张的稳妥性的风险。