The right to be forgotten (RTBF) is motivated by the desire of people not to be perpetually disadvantaged by their past deeds. For this, data deletion needs to be deep and permanent, and should be removed from machine learning models. Researchers have proposed machine unlearning algorithms which aim to erase specific data from trained models more efficiently. However, these methods modify how data is fed into the model and how training is done, which may subsequently compromise AI ethics from the fairness perspective. To help software engineers make responsible decisions when adopting these unlearning methods, we present the first study on machine unlearning methods to reveal their fairness implications. We designed and conducted experiments on two typical machine unlearning methods (SISA and AmnesiacML) along with a retraining method (ORTR) as baseline using three fairness datasets under three different deletion strategies. Experimental results show that under non-uniform data deletion, SISA leads to better fairness compared with ORTR and AmnesiacML, while initial training and uniform data deletion do not necessarily affect the fairness of all three methods. These findings have exposed an important research problem in software engineering, and can help practitioners better understand the potential trade-offs on fairness when considering solutions for RTBF.
翻译:被遗忘的权利(RTBF)的动机是人们希望不会因为过去的行为而永远处于劣势。为此,数据删除需要深度和永久性,并且应当从机器学习模式中去除。研究人员提出了旨在更高效地从经过培训的模式中抹去具体数据的机学非学习算法,然而,这些方法改变了如何将数据输入模型和如何开展培训,从而从公平的角度可能随后损害AI道德操守。为了帮助软件工程师在采用这些不学习方法时作出负责任的决定,我们提出了关于机器不学习方法的第一份研究报告,以揭示其公平影响。我们设计并进行了两种典型的机器不学习方法(SISA和AmnesiacML)的实验,同时用再培训方法作为基准,使用三种不同的删除战略下的三种公平数据集。实验结果表明,在不统一数据删除的情况下,SISA使与ORTR和AmnesiacML相比更加公平,而初步培训和统一数据删除不一定影响所有三种方法的公平性。这些研究结果暴露了软件工程中的一个重要研究问题,并有助于从业人员在考虑解决方案时更好地了解公平性时更好地了解潜在的交易。