Recent advances in Large Language Models (LLMs) have enabled human-like responses across various tasks, raising questions about their ethical decision-making capabilities and potential biases. This study systematically evaluates how nine popular LLMs (both open-source and closed-source) respond to ethical dilemmas involving protected attributes. Across 50,400 trials spanning single and intersectional attribute combinations in four dilemma scenarios (protective vs. harmful), we assess models' ethical preferences, sensitivity, stability, and clustering patterns. Results reveal significant biases in protected attributes in all models, with differing preferences depending on model type and dilemma context. Notably, open-source LLMs show stronger preferences for marginalized groups and greater sensitivity in harmful scenarios, while closed-source models are more selective in protective situations and tend to favor mainstream groups. We also find that ethical behavior varies across dilemma types: LLMs maintain consistent patterns in protective scenarios but respond with more diverse and cognitively demanding decisions in harmful ones. Furthermore, models display more pronounced ethical tendencies under intersectional conditions than in single-attribute settings, suggesting that complex inputs reveal deeper biases. These findings highlight the need for multi-dimensional, context-aware evaluation of LLMs' ethical behavior and offer a systematic evaluation and approach to understanding and addressing fairness in LLM decision-making.
翻译:大型语言模型(LLMs)的最新进展使其能够在各类任务中生成类人响应,这引发了对其伦理决策能力及潜在偏见的质疑。本研究系统评估了九种主流LLMs(包括开源与闭源模型)在涉及受保护属性的伦理困境中的响应。通过在四种困境场景(保护性 vs. 伤害性)中,对单一及交叉属性组合进行总计50,400次试验,我们评估了模型的伦理偏好、敏感性、稳定性及聚类模式。结果显示,所有模型在受保护属性上均存在显著偏见,且偏好因模型类型和困境情境而异。值得注意的是,开源LLMs对边缘化群体表现出更强的偏好,并在伤害性场景中具有更高的敏感性;而闭源模型在保护性情境中更为审慎,且倾向于偏向主流群体。我们还发现伦理行为随困境类型变化:LLMs在保护性场景中保持一致的响应模式,但在伤害性场景中做出更多样化且认知要求更高的决策。此外,模型在交叉属性条件下的伦理倾向比单一属性情境更为明显,表明复杂输入能揭示更深层的偏见。这些发现强调了对LLMs伦理行为进行多维度、情境感知评估的必要性,并为理解和解决LLM决策的公平性问题提供了系统化的评估框架与方法。