Artificial intelligence (AI) advances rapidly but achieving complete human control over AI risks remains an unsolved problem, akin to driving the fast AI "train" without a "brake system." By exploring fundamental control mechanisms at key elements of AI decisions, this paper develops a systematic solution to thoroughly control AI risks, providing an architecture for AI governance and legislation with five pillars supported by six control mechanisms, illustrated through a minimum set of AI Mandates (AIMs). Three of the AIMs must be built inside AI systems and three in society to address major areas of AI risks: 1) align AI values with human users; 2) constrain AI decision-actions by societal ethics, laws, and regulations; 3) build in human intervention options for emergencies and shut-off switches for existential threats; 4) limit AI access to resources to reinforce controls inside AI; 5) mitigate spillover risks like job loss from AI. We also highlight the differences in AI governance on physical AI systems versus generative AI. We discuss how to strengthen analog physical safeguards to prevent smarter AI/AGI/ASI from circumventing core safety controls by exploiting AI's intrinsic disconnect from the analog physical world: AI's nature as pure software code run on chips controlled by humans, and the prerequisite that all AI-driven physical actions must be digitized. These findings establish a theoretical foundation for AI governance and legislation as the basic structure of a "brake system" for AI decisions. If enacted, these controls can rein in AI dangers as completely as humanly possible, removing large chunks of currently wide-open AI risks, substantially reducing overall AI risks to residual human errors.
翻译:人工智能(AI)发展迅速,但实现对AI风险的完全人类控制仍是一个未解难题,类似于驾驶没有“刹车系统”的快速AI“列车”。本文通过探索AI决策关键要素的基础控制机制,提出了一套系统性解决方案,以彻底控制AI风险,构建了一个由六大控制机制支撑、包含五大支柱的人工智能治理与立法架构,并通过一组最小化的人工智能指令(AIMs)加以阐释。其中三项AIMs必须内置于AI系统中,另外三项需在社会层面实施,以应对AI风险的主要领域:1)使AI价值观与人类用户对齐;2)通过社会伦理、法律和法规约束AI决策与行为;3)为紧急情况内置人工干预选项,并为生存威胁设置关闭开关;4)限制AI对资源的访问以强化其内部控制;5)缓解AI带来的溢出风险,如失业问题。我们还强调了物理AI系统与生成式AI在治理上的差异。我们讨论了如何加强模拟物理安全措施,以防止更智能的AI/AGI/ASI通过利用AI与模拟物理世界的内在脱节来规避核心安全控制:AI的本质是人类控制的芯片上运行的纯软件代码,且所有AI驱动的物理行为都必须数字化。这些发现为AI治理与立法奠定了理论基础,构成了AI决策“刹车系统”的基本结构。若得以实施,这些控制机制能够尽可能彻底地遏制AI危险,消除当前广泛存在的AI风险的大部分,从而将整体AI风险显著降低至残余人为错误的水平。