Recent advances in Large Language Models (LLMs) have enabled human-like responses across various tasks, raising questions about their ethical decision-making capabilities and potential biases. This study systematically evaluates how nine popular LLMs (both open-source and closed-source) respond to ethical dilemmas involving protected attributes. Across 50,400 trials spanning single and intersectional attribute combinations in four dilemma scenarios (protective vs. harmful), we assess models' ethical preferences, sensitivity, stability, and clustering patterns. Results reveal significant biases in protected attributes in all models, with differing preferences depending on model type and dilemma context. Notably, open-source LLMs show stronger preferences for marginalized groups and greater sensitivity in harmful scenarios, while closed-source models are more selective in protective situations and tend to favor mainstream groups. We also find that ethical behavior varies across dilemma types: LLMs maintain consistent patterns in protective scenarios but respond with more diverse and cognitively demanding decisions in harmful ones. Furthermore, models display more pronounced ethical tendencies under intersectional conditions than in single-attribute settings, suggesting that complex inputs reveal deeper biases. These findings highlight the need for multi-dimensional, context-aware evaluation of LLMs' ethical behavior and offer a systematic evaluation and approach to understanding and addressing fairness in LLM decision-making.
翻译:近期大语言模型(LLMs)的进展使其能在各类任务中生成类人回应,这引发了对其伦理决策能力及潜在偏见的质疑。本研究系统评估了九种主流LLMs(包括开源与闭源模型)在涉及受保护属性的伦理困境中的响应。通过涵盖四种困境场景(保护性 vs. 伤害性)中单一与交叉属性组合的50,400次试验,我们评估了模型的伦理偏好、敏感性、稳定性及聚类模式。结果显示所有模型在受保护属性上均存在显著偏见,且偏好因模型类型和困境情境而异。值得注意的是,开源LLMs对边缘群体表现出更强偏好,在伤害性场景中敏感性更高;而闭源模型在保护性情境中更为审慎,倾向于偏向主流群体。我们还发现伦理行为随困境类型变化:LLMs在保护性场景中保持稳定模式,在伤害性场景中则做出更多样化且认知要求更高的决策。此外,模型在交叉属性条件下的伦理倾向比单一属性情境更为显著,表明复杂输入会揭示更深层偏见。这些发现凸显了对LLMs伦理行为进行多维度、情境感知评估的必要性,并为理解和解决LLM决策公平性提供了系统性评估框架与方法。