Recent advances in machine learning leverage massive datasets of unlabeled images from the web to learn general-purpose image representations for tasks from image classification to face recognition. But do unsupervised computer vision models automatically learn implicit patterns and embed social biases that could have harmful downstream effects? We develop a novel method for quantifying biased associations between representations of social concepts and attributes in images. We find that state-of-the-art unsupervised models trained on ImageNet, a popular benchmark image dataset curated from internet images, automatically learn racial, gender, and intersectional biases. We replicate 8 previously documented human biases from social psychology, from the innocuous, as with insects and flowers, to the potentially harmful, as with race and gender. Our results closely match three hypotheses about intersectional bias from social psychology. For the first time in unsupervised computer vision, we also quantify implicit human biases about weight, disabilities, and several ethnicities. When compared with statistical patterns in online image datasets, our findings suggest that machine learning models can automatically learn bias from the way people are stereotypically portrayed on the web.
翻译:机器学习的最近进展利用网络上大量未贴标签图像的数据集,学习从图像分类到面对面的识别任务的一般用途图像表示。 但是,在不受监督的计算机视觉模型是否自动学习隐含模式并嵌入社会偏见,从而产生有害的下游影响? 我们开发了一种新的方法来量化在图像中反映社会概念和属性之间的偏向性联系。 我们发现在图像网络上培训的最先进的未经监督的模型,这是一个从互联网图像中分类的流行基准图像数据集,自动学习种族、性别和交叉偏见。 我们复制了8种先前记载的人类偏见,从社会心理学,从无意义的昆虫和鲜花,到潜在的有害,如种族和性别。 我们的结果与关于社会心理学的交叉偏见的三个假设非常吻合。 在未经监督的计算机视觉中,我们第一次还量化了人对重量、残疾和几个族裔的隐含偏见。与在线图像数据集中的统计模式相比,我们的研究结果表明,机器学习模型可以自动地从人们在网络上陈规定的方式中学习偏见。