Recent advances in robotic manipulation have integrated low-level robotic control into Vision-Language Models (VLMs), extending them into Vision-Language-Action (VLA) models. Although state-of-the-art VLAs achieve strong performance in downstream robotic applications, supported by large-scale crowd-sourced robot training data, they still inevitably encounter failures during execution. Enabling robots to reason and recover from unpredictable and abrupt failures remains a critical challenge. Existing robotic manipulation datasets, collected in either simulation or the real world, primarily provide only ground-truth trajectories, leaving robots unable to recover once failures occur. Moreover, the few datasets that address failure detection typically offer only textual explanations, which are difficult to utilize directly in VLA models. To address this gap, we introduce FailSafe, a novel failure generation and recovery system that automatically produces diverse failure cases paired with executable recovery actions. FailSafe can be seamlessly applied to any manipulation task in any simulator, enabling scalable creation of failure action data. To demonstrate its effectiveness, we fine-tune LLaVa-OneVision-7B (LLaVa-OV-7B) to build FailSafe-VLM. Experimental results show that FailSafe-VLM successfully helps robotic arms detect and recover from potential failures, improving the performance of three state-of-the-art VLA models (pi0-FAST, OpenVLA, OpenVLA-OFT) by up to 22.6% on average across several tasks in Maniskill. Furthermore, FailSafe-VLM could generalize across different spatial configurations, camera viewpoints, object and robotic embodiments. We plan to release the FailSafe code to the community.
翻译:机器人操控领域的最新进展已将底层机器人控制集成到视觉-语言模型(VLMs)中,将其扩展为视觉-语言-动作(VLA)模型。尽管当前最先进的VLA模型在大规模众包机器人训练数据的支持下,在下游机器人应用中表现出色,但在执行过程中仍不可避免地遭遇故障。使机器人能够对不可预测的突发故障进行推理和恢复,仍然是一个关键挑战。现有的机器人操控数据集,无论是在仿真环境还是真实世界中收集的,主要仅提供真实轨迹数据,导致机器人一旦发生故障便无法恢复。此外,少数涉及故障检测的数据集通常仅提供文本解释,难以直接在VLA模型中利用。为填补这一空白,我们提出了FailSafe,一种新颖的故障生成与恢复系统,能够自动生成多样化的故障案例,并配以可执行的恢复动作。FailSafe可无缝应用于任何仿真器中的任意操控任务,从而实现故障动作数据的规模化创建。为验证其有效性,我们微调了LLaVa-OneVision-7B(LLaVa-OV-7B)以构建FailSafe-VLM。实验结果表明,FailSafe-VLM成功帮助机械臂检测并恢复潜在故障,在Maniskill平台的多个任务中,将三种最先进的VLA模型(pi0-FAST、OpenVLA、OpenVLA-OFT)的平均性能提升高达22.6%。此外,FailSafe-VLM能够泛化至不同的空间配置、相机视角、物体形态及机器人实体。我们计划向社区开源FailSafe代码。