When AI interacts with the physical world -- as a robot or an assistive agent -- new safety challenges emerge beyond those of purely ``digital AI". In such interactions, the potential for physical harm is direct and immediate. How well do state-of-the-art foundation models understand common-sense facts about physical safety, e.g. that a box may be too heavy to lift, or that a hot cup of coffee should not be handed to a child? In this paper, our contributions are three-fold: first, we develop a highly scalable approach to continuous physical safety benchmarking of Embodied AI systems, grounded in real-world injury narratives and operational safety constraints. To probe multi-modal safety understanding, we turn these narratives and constraints into photorealistic images and videos capturing transitions from safe to unsafe states, using advanced generative models. Secondly, we comprehensively analyze the ability of major foundation models to perceive risks, reason about safety, and trigger interventions; this yields multi-faceted insights into their deployment readiness for safety-critical agentic applications. Finally, we develop a post-training paradigm to teach models to explicitly reason about embodiment-specific safety constraints provided through system instructions. The resulting models generate thinking traces that make safety reasoning interpretable and transparent, achieving state of the art performance in constraint satisfaction evaluations. The benchmark is released at https://asimov-benchmark.github.io/v2
翻译:当AI与物理世界交互时——无论是作为机器人还是辅助智能体——其面临的安全挑战已超越纯粹的“数字AI”。在此类交互中,物理伤害的风险是直接且即时存在的。当前最先进的基础模型对物理安全的常识性认知达到何种程度?例如,能否理解箱子可能过重而无法抬起,或不应将热咖啡递给儿童?本文的贡献包含三个方面:首先,我们基于真实世界伤害案例与操作安全约束,开发了一种高度可扩展的持续物理安全基准测试方法,用于具身AI系统。为探究多模态安全理解能力,我们运用先进生成模型将这些案例与约束转化为捕捉从安全状态到危险状态转变的逼真图像与视频。其次,我们全面分析了主流基础模型在风险感知、安全推理及触发干预方面的能力,从而为其在安全关键型智能体应用中的部署准备度提供多维度洞察。最后,我们开发了一种后训练范式,通过系统指令教导模型显式推理特定于具身场景的安全约束。所得模型生成的思维轨迹使安全推理过程具备可解释性与透明度,在约束满足评估中达到最先进性能。基准测试发布于 https://asimov-benchmark.github.io/v2