分布式通用人工智能安全 (Distributional AGI Safety)

AI safety and alignment research has predominantly been focused on methods for safeguarding individual AI systems, resting on the assumption of an eventual emergence of a monolithic Artificial General Intelligence (AGI). The alternative AGI emergence hypothesis, where general capability levels are first manifested through coordination in groups of sub-AGI individual agents with complementary skills and affordances, has received far less attention. Here we argue that this patchwork AGI hypothesis needs to be given serious consideration, and should inform the development of corresponding safeguards and mitigations. The rapid deployment of advanced AI agents with tool-use capabilities and the ability to communicate and coordinate makes this an urgent safety consideration. We therefore propose a framework for distributional AGI safety that moves beyond evaluating and aligning individual agents. This framework centers on the design and implementation of virtual agentic sandbox economies (impermeable or semi-permeable), where agent-to-agent transactions are governed by robust market mechanisms, coupled with appropriate auditability, reputation management, and oversight to mitigate collective risks.

翻译：人工智能安全与对齐研究主要集中于保障个体人工智能系统的防护方法，其前提假设是未来将出现一个单一的通用人工智能（AGI）。另一种AGI涌现假说——即通用能力水平首先通过具有互补技能和可供性的亚AGI个体智能体群体中的协调协作得以体现——则鲜少受到关注。本文认为，这一拼凑式AGI假说需要得到严肃考量，并应指导相应防护与缓解措施的制定。具备工具使用能力、能够沟通与协调的先进AI智能体的快速部署，使得这一安全议题尤为紧迫。为此，我们提出一个超越个体智能体评估与对齐的分布式AGI安全框架。该框架以设计与实施虚拟智能体沙盒经济（不可渗透或半可渗透）为核心，其中智能体间的交互由稳健的市场机制所约束，并辅以适当的可审计性、声誉管理与监督机制，以缓解集体性风险。