Prominent AI companies are producing 'safety frameworks' as a type of voluntary self-governance. These statements purport to establish risk thresholds and safety procedures for the development and deployment of highly capable AI. Understanding which AI risks are covered and what actions are allowed, refused, demanded, encouraged, or discouraged by these statements is vital for assessing how these frameworks actually govern AI development and deployment. We draw on affordance theory to analyse the OpenAI 'Preparedness Framework Version 2' (April 2025) using the Mechanisms & Conditions model of affordances and the MIT AI Risk Repository. We find that this safety policy requests evaluation of a small minority of AI risks, encourages deployment of systems with 'Medium' capabilities for unintentionally enabling 'severe harm' (which OpenAI defines as >1000 deaths or >$100B in damages), and allows OpenAI's CEO to deploy even more dangerous capabilities. These findings suggest that effective mitigation of AI risks requires more robust governance interventions beyond current industry self-regulation. Our affordance analysis provides a replicable method for evaluating what safety frameworks actually permit versus what they claim.
翻译:知名AI公司正制定“安全框架”作为自愿性自我治理的一种形式。这些声明声称要为高能力AI的开发和部署建立风险阈值与安全规程。理解这些声明涵盖哪些AI风险,以及允许、拒绝、要求、鼓励或劝阻何种行动,对于评估这些框架如何实际治理AI开发与部署至关重要。我们借鉴可供性理论,运用可供性的机制与条件模型及MIT AI风险知识库,对OpenAI《准备框架2.0版》(2025年4月)进行分析。研究发现:该安全政策仅要求评估少数AI风险;鼓励部署具备“中等”能力但可能无意中导致“严重损害”(OpenAI定义为>1000人死亡或>1000亿美元损失)的系统;并允许OpenAI首席执行官部署更具危险性的能力。这些发现表明,有效缓解AI风险需要比当前行业自我监管更强大的治理干预措施。我们的可供性分析提供了一种可复现的方法,用于评估安全框架实际允许的内容与其声称内容之间的差异。