Text-to-image diffusion models, such as Stable Diffusion, have demonstrated remarkable capabilities in generating high-quality and diverse images from natural language prompts. However, recent studies reveal that these models often replicate and amplify societal biases, particularly along demographic attributes like gender and race. In this paper, we introduce FairImagen (https://github.com/fuzihaofzh/FairImagen), a post-hoc debiasing framework that operates on prompt embeddings to mitigate such biases without retraining or modifying the underlying diffusion model. Our method integrates Fair Principal Component Analysis to project CLIP-based input embeddings into a subspace that minimizes group-specific information while preserving semantic content. We further enhance debiasing effectiveness through empirical noise injection and propose a unified cross-demographic projection method that enables simultaneous debiasing across multiple demographic attributes. Extensive experiments across gender, race, and intersectional settings demonstrate that FairImagen significantly improves fairness with a moderate trade-off in image quality and prompt fidelity. Our framework outperforms existing post-hoc methods and offers a simple, scalable, and model-agnostic solution for equitable text-to-image generation.
翻译:文本到图像扩散模型(如 Stable Diffusion)已展现出根据自然语言提示生成高质量多样化图像的卓越能力。然而,近期研究表明,这些模型常常复制并放大社会偏见,特别是在性别和种族等人口属性维度。本文提出 FairImagen (https://github.com/fuzihaofzh/FairImagen),一种基于提示嵌入的后处理去偏框架,无需重新训练或修改底层扩散模型即可缓解此类偏见。该方法集成公平主成分分析,将基于 CLIP 的输入嵌入投影到最小化群体特定信息同时保留语义内容的子空间中。我们通过经验性噪声注入进一步提升去偏效果,并提出统一跨人口投影方法,实现对多个人口属性同时进行去偏处理。在性别、种族及交叉属性场景下的大量实验表明,FairImagen 在适度权衡图像质量与提示保真度的前提下显著提升了公平性。该框架优于现有后处理方法,为公平的文本到图像生成提供了简洁、可扩展且模型无关的解决方案。