面向基于生成式人工智能的图像合成促进AI皮肤病变分类器公平性评估 (Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis)

Recent advances in deep learning and on-device inference could transform routine screening for skin cancers. Along with the anticipated benefits of this technology, potential dangers arise from unforeseen and inherent biases. A significant obstacle is building evaluation datasets that accurately reflect key demographics, including sex, age, and race, as well as other underrepresented groups. To address this, we train a state-of-the-art generative model to generate synthetic data in a controllable manner to assess the fairness of publicly available skin cancer classifiers. To evaluate whether synthetic images can be used as a fairness testing dataset, we prepare a real-image dataset (MILK10K) as a benchmark and compare the True Positive Rate result of three models (DeepGuide, MelaNet, and SkinLesionDensnet). As a result, the classification tendencies observed in each model when tested on real and generated images showed similar patterns across different attribute data sets. We confirm that highly realistic synthetic images facilitate model fairness verification.

翻译：深度学习与设备端推理的最新进展有望变革皮肤癌的常规筛查。伴随该技术预期效益的同时，潜在风险亦源于未预见及固有的偏差。构建能准确反映关键人口统计学特征（包括性别、年龄、种族）及其他代表性不足群体的评估数据集，是一个重大障碍。为此，我们训练了一种前沿的生成模型，以可控方式生成合成数据，用于评估公开可用的皮肤癌分类器的公平性。为验证合成图像是否可作为公平性测试数据集，我们准备了真实图像数据集（MILK10K）作为基准，并比较了三种模型（DeepGuide、MelaNet和SkinLesionDensnet）的真阳性率结果。实验表明，在真实图像与生成图像上测试时，各模型在不同属性数据集上呈现的分类趋势具有相似模式。我们证实，高真实感的合成图像能够有效促进模型公平性验证。