Understanding what constitutes safety in AI-generated content is complex. While developers often rely on predefined taxonomies, real-world safety judgments also involve personal, social, and cultural perceptions of harm. This paper examines how annotators evaluate the safety of AI-generated images, focusing on the qualitative reasoning behind their judgments. Analyzing 5,372 open-ended comments, we find that annotators consistently invoke moral, emotional, and contextual reasoning that extends beyond structured safety categories. Many reflect on potential harm to others more than to themselves, grounding their judgments in lived experience, collective risk, and sociocultural awareness. Beyond individual perceptions, we also find that the structure of the task itself -- including annotation guidelines -- shapes how annotators interpret and express harm. Guidelines influence not only which images are flagged, but also the moral judgment behind the justifications. Annotators frequently cite factors such as image quality, visual distortion, and mismatches between prompt and output as contributing to perceived harm dimensions, which are often overlooked in standard evaluation frameworks. Our findings reveal that existing safety pipelines miss critical forms of reasoning that annotators bring to the task. We argue for evaluation designs that scaffold moral reflection, differentiate types of harm, and make space for subjective, context-sensitive interpretations of AI-generated content.
翻译:理解人工智能生成内容的安全性构成是复杂的。虽然开发者通常依赖预定义分类体系,但现实世界的安全判断还涉及对伤害的个人、社会及文化认知。本文研究标注者如何评估AI生成图像的安全性,重点关注其判断背后的定性推理。通过分析5,372条开放式评论,我们发现标注者持续调用超越结构化安全类别的道德、情感和情境推理。许多人更多考虑对他人而非自身的潜在伤害,将其判断基于生活经验、集体风险和社会文化意识。除个体认知外,我们还发现任务结构本身——包括标注指南——塑造了标注者解释和表达伤害的方式。指南不仅影响哪些图像被标记,还影响论证背后的道德判断。标注者频繁提及图像质量、视觉畸变以及提示与输出不匹配等因素对感知伤害维度的影响,而这些因素在标准评估框架中常被忽视。我们的研究揭示,现有安全流程遗漏了标注者带入任务的关键推理形式。我们主张采用支持道德反思、区分伤害类型,并为AI生成内容的主观情境敏感解释留出空间的评估设计。