FailSafe：视觉-语言-动作模型中的故障推理与恢复 (FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models)

Recent advances in robotic manipulation have integrated low-level robotic control into Vision-Language Models (VLMs), extending them into Vision-Language-Action (VLA) models. Although state-of-the-art VLAs achieve strong performance in downstream robotic applications, supported by large-scale crowd-sourced robot training data, they still inevitably encounter failures during execution. Enabling robots to reason and recover from unpredictable and abrupt failures remains a critical challenge. Existing robotic manipulation datasets, collected in either simulation or the real world, primarily provide only ground-truth trajectories, leaving robots unable to recover once failures occur. Moreover, the few datasets that address failure detection typically offer only textual explanations, which are difficult to utilize directly in VLA models. To address this gap, we introduce FailSafe, a novel failure generation and recovery system that automatically produces diverse failure cases paired with executable recovery actions. FailSafe can be seamlessly applied to any manipulation task in any simulator, enabling scalable creation of failure action data. To demonstrate its effectiveness, we fine-tune LLaVa-OneVision-7B (LLaVa-OV-7B) to build FailSafe-VLM. Experimental results show that FailSafe-VLM successfully helps robotic arms detect and recover from potential failures, improving the performance of three state-of-the-art VLA models (pi0-FAST, OpenVLA, OpenVLA-OFT) by up to 22.6% on average across several tasks in Maniskill. Furthermore, FailSafe-VLM could generalize across different spatial configurations, camera viewpoints, object and robotic embodiments. We plan to release the FailSafe code to the community.

翻译：机器人操控领域的最新进展已将底层机器人控制集成到视觉-语言模型（VLMs）中，将其扩展为视觉-语言-动作（VLA）模型。尽管当前最先进的VLA模型在大规模众包机器人训练数据的支持下，在下游机器人应用中表现出色，但在执行过程中仍不可避免地遭遇故障。使机器人能够对不可预测的突发故障进行推理和恢复，仍然是一个关键挑战。现有的机器人操控数据集，无论是在仿真环境还是真实世界中收集的，主要仅提供真实轨迹数据，导致机器人一旦发生故障便无法恢复。此外，少数涉及故障检测的数据集通常仅提供文本解释，难以直接在VLA模型中利用。为填补这一空白，我们提出了FailSafe，一种新颖的故障生成与恢复系统，能够自动生成多样化的故障案例，并配以可执行的恢复动作。FailSafe可无缝应用于任何仿真器中的任意操控任务，从而实现故障动作数据的规模化创建。为验证其有效性，我们微调了LLaVa-OneVision-7B（LLaVa-OV-7B）以构建FailSafe-VLM。实验结果表明，FailSafe-VLM成功帮助机械臂检测并恢复潜在故障，在Maniskill平台的多个任务中，将三种最先进的VLA模型（pi0-FAST、OpenVLA、OpenVLA-OFT）的平均性能提升高达22.6%。此外，FailSafe-VLM能够泛化至不同的空间配置、相机视角、物体形态及机器人实体。我们计划向社区开源FailSafe代码。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日