AI伦理困境决策中的偏见：ChatGPT与Claude的比较研究 (Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude)

from arxiv, This paper has been accepted by The 20th International AAAI Conference on Web and Social Media (ICWSM 2026), sunny Los Angeles, California

Recent advances in Large Language Models (LLMs) have enabled human-like responses across various tasks, raising questions about their ethical decision-making capabilities and potential biases. This study systematically evaluates how nine popular LLMs (both open-source and closed-source) respond to ethical dilemmas involving protected attributes. Across 50,400 trials spanning single and intersectional attribute combinations in four dilemma scenarios (protective vs. harmful), we assess models' ethical preferences, sensitivity, stability, and clustering patterns. Results reveal significant biases in protected attributes in all models, with differing preferences depending on model type and dilemma context. Notably, open-source LLMs show stronger preferences for marginalized groups and greater sensitivity in harmful scenarios, while closed-source models are more selective in protective situations and tend to favor mainstream groups. We also find that ethical behavior varies across dilemma types: LLMs maintain consistent patterns in protective scenarios but respond with more diverse and cognitively demanding decisions in harmful ones. Furthermore, models display more pronounced ethical tendencies under intersectional conditions than in single-attribute settings, suggesting that complex inputs reveal deeper biases. These findings highlight the need for multi-dimensional, context-aware evaluation of LLMs' ethical behavior and offer a systematic evaluation and approach to understanding and addressing fairness in LLM decision-making.

翻译：大型语言模型（LLMs）的最新进展使其能够在各类任务中生成类人响应，这引发了对其伦理决策能力及潜在偏见的质疑。本研究系统评估了九种主流LLMs（包括开源与闭源模型）在涉及受保护属性的伦理困境中的响应。通过在四种困境场景（保护性 vs. 伤害性）中，对单一及交叉属性组合进行总计50,400次试验，我们评估了模型的伦理偏好、敏感性、稳定性及聚类模式。结果显示，所有模型在受保护属性上均存在显著偏见，且偏好因模型类型和困境情境而异。值得注意的是，开源LLMs对边缘化群体表现出更强的偏好，并在伤害性场景中具有更高的敏感性；而闭源模型在保护性情境中更为审慎，且倾向于偏向主流群体。我们还发现伦理行为随困境类型变化：LLMs在保护性场景中保持一致的响应模式，但在伤害性场景中做出更多样化且认知要求更高的决策。此外，模型在交叉属性条件下的伦理倾向比单一属性情境更为明显，表明复杂输入能揭示更深层的偏见。这些发现强调了对LLMs伦理行为进行多维度、情境感知评估的必要性，并为理解和解决LLM决策的公平性问题提供了系统化的评估框架与方法。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日