Recent advances in video diffusion models have significantly enhanced text-to-video generation, particularly through alignment tuning using reward models trained on human preferences. While these methods improve visual quality, they can unintentionally encode and amplify social biases. To systematically trace how such biases evolve throughout the alignment pipeline, we introduce VideoBiasEval, a comprehensive diagnostic framework for evaluating social representation in video generation. Grounded in established social bias taxonomies, VideoBiasEval employs an event-based prompting strategy to disentangle semantic content (actions and contexts) from actor attributes (gender and ethnicity). It further introduces multi-granular metrics to evaluate (1) overall ethnicity bias, (2) gender bias conditioned on ethnicity, (3) distributional shifts in social attributes across model variants, and (4) the temporal persistence of bias within videos. Using this framework, we conduct the first end-to-end analysis connecting biases in human preference datasets, their amplification in reward models, and their propagation through alignment-tuned video diffusion models. Our results reveal that alignment tuning not only strengthens representational biases but also makes them temporally stable, producing smoother yet more stereotyped portrayals. These findings highlight the need for bias-aware evaluation and mitigation throughout the alignment process to ensure fair and socially responsible video generation.
翻译:近期视频扩散模型的进展显著提升了文本到视频生成的质量,特别是通过使用基于人类偏好训练的奖励模型进行对齐调优。尽管这些方法改善了视觉质量,却可能无意中编码并放大社会偏见。为系统追踪此类偏见在对齐流程中的演变路径,我们提出了VideoBiasEval——一个用于评估视频生成中社会表征的综合诊断框架。该框架基于成熟的社会偏见分类体系,采用基于事件的提示策略,将语义内容(行为与情境)与演员属性(性别与种族)解耦分析。进一步引入多粒度指标以评估:(1)整体种族偏见,(2)以种族为条件的性别偏见,(3)不同模型变体间社会属性的分布偏移,以及(4)偏见在视频中的时间持续性。借助该框架,我们首次开展了端到端分析,追溯人类偏好数据集中的偏见、其在奖励模型中的放大效应,以及通过对齐调优视频扩散模型的传播过程。实验结果表明,对齐调优不仅强化了表征偏见,还使其具有时间稳定性,生成更平滑但更刻板的描绘。这些发现凸显了在对齐全流程中实施偏见感知评估与缓解机制的必要性,以确保公平且具有社会责任的视频生成。