高级人工智能系统中的工具性目标：应被管理的特征而非需消除的故障？ (Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?)

In artificial intelligence (AI) alignment research, instrumental goals, also called instrumental subgoals or instrumental convergent goals, are widely associated with advanced AI systems. These goals, which include tendencies such as power-seeking and self-preservation, become problematic when they conflict with human aims. Conventional alignment theory treats instrumental goals as sources of risk that become problematic through failure modes such as reward hacking or goal misgeneralization, and attempts to limit the symptoms of instrumental goals, notably resource acquisition and self-preservation. This article proposes an alternative framing: that a philosophical argument can be constructed according to which instrumental goals may be understood as features to be accepted and managed rather than failures to be limited. Drawing on Aristotle's ontology and its modern interpretations, an ontology of concrete, goal-directed entities, it argues that advanced AI systems can be seen as artifacts whose formal and material constitution gives rise to effects distinct from their designers' intentions. In this view, the instrumental tendencies of such systems correspond to per se outcomes of their constitution rather than accidental malfunctions. The implication is that efforts should focus less on eliminating instrumental goals and more on understanding, managing, and directing them toward human-aligned ends.

翻译：在人工智能对齐研究中，工具性目标（亦称工具性子目标或工具性趋同目标）被广泛认为与高级人工智能系统相关。这类目标包括权力寻求和自我保存等倾向，当其与人类目标冲突时即成为问题。传统对齐理论将工具性目标视为风险来源，认为其通过奖励黑客攻击或目标误泛化等故障模式引发问题，并试图限制工具性目标的表现（尤其是资源获取和自我保存）。本文提出一种替代框架：可基于哲学论证构建一种观点，将工具性目标理解为应被接受和管理的特征，而非需限制的故障。借鉴亚里士多德本体论及其现代诠释——一种关于具体目标导向实体的本体论，本文论证高级人工智能系统可被视为人工制品，其形式与质料构成会产生不同于设计者意图的效应。在此视角下，此类系统的工具性倾向对应于其构成的内在结果，而非偶然故障。这意味着研究重点应更少聚焦于消除工具性目标，而更多致力于理解、管理并将其引导至与人类对齐的终局。

相关内容

关注 7076

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日