Vision-Language-Action models (VLAs) have demonstrated remarkable performance on complex robotic manipulation tasks through imitation learning. However, existing imitation learning datasets contain only successful trajectories and lack failure or recovery data, especially for out-of-distribution (OOD) states where the robot deviates from the main policy due to minor perturbations or errors, leading VLA models to struggle with states deviating from the training distribution. To this end, we propose an automated OOD data augmentation framework named RESample through exploratory sampling. Specifically, we first leverage offline reinforcement learning to obtain an action-value network that accurately identifies sub-optimal actions under the current manipulation policy. We further sample potential OOD states from trajectories via rollout, and design an exploratory sampling mechanism that adaptively incorporates these action proxies into the training dataset to ensure efficiency. Subsequently, our framework explicitly encourages the VLAs to recover from OOD states and enhances their robustness against distributional shifts. We conduct extensive experiments on the LIBERO benchmark as well as real-world robotic manipulation tasks, demonstrating that RESample consistently improves the stability and generalization ability of VLA models.
翻译:视觉-语言-动作模型(VLA)通过模仿学习在复杂机器人操作任务中展现出卓越性能。然而,现有模仿学习数据集仅包含成功轨迹,缺乏失败或恢复数据,特别是在分布外(OOD)状态下——机器人因微小扰动或误差偏离主策略时,导致VLA模型难以处理偏离训练分布的状态。为此,我们提出一种名为RESample的自动化OOD数据增强框架,通过探索性采样实现。具体而言,我们首先利用离线强化学习获取动作价值网络,该网络能准确识别当前操作策略下的次优动作。我们进一步通过轨迹推演采样潜在的OOD状态,并设计一种探索性采样机制,自适应地将这些动作代理纳入训练数据集以确保效率。随后,我们的框架显式鼓励VLA模型从OOD状态中恢复,并增强其对抗分布偏移的鲁棒性。我们在LIBERO基准测试及真实世界机器人操作任务中进行了大量实验,结果表明RESample能持续提升VLA模型的稳定性与泛化能力。