Unified video and action prediction models hold great potential for robotic manipulation, as future observations offer contextual cues for planning, while actions reveal how interactions shape the environment. However, most existing approaches treat observation and action generation in a monolithic and goal-agnostic manner, often leading to semantically misaligned predictions and incoherent behaviors. To this end, we propose H-GAR, a Hierarchical interaction framework via Goal-driven observation-Action Refinement.To anchor prediction to the task objective, H-GAR first produces a goal observation and a coarse action sketch that outline a high-level route toward the goal. To enable explicit interaction between observation and action under the guidance of the goal observation for more coherent decision-making, we devise two synergistic modules. (1) Goal-Conditioned Observation Synthesizer (GOS) synthesizes intermediate observations based on the coarse-grained actions and the predicted goal observation. (2) Interaction-Aware Action Refiner (IAAR) refines coarse actions into fine-grained, goal-consistent actions by leveraging feedback from the intermediate observations and a Historical Action Memory Bank that encodes prior actions to ensure temporal consistency. By integrating goal grounding with explicit action-observation interaction in a coarse-to-fine manner, H-GAR enables more accurate manipulation. Extensive experiments on both simulation and real-world robotic manipulation tasks demonstrate that H-GAR achieves state-of-the-art performance.


翻译:统一的视频与动作预测模型在机器人操控领域具有巨大潜力,因为未来观测为规划提供了上下文线索,而动作则揭示了交互如何塑造环境。然而,现有方法大多以整体且目标无关的方式处理观测与动作生成,常导致语义错位的预测和不连贯的行为。为此,我们提出H-GAR,一种通过目标驱动的观测-动作细化的分层交互框架。为将预测锚定于任务目标,H-GAR首先生成一个目标观测和一个粗略动作草图,勾勒出通往目标的高层路径。为实现观测与动作在目标观测引导下的显式交互以支持更连贯的决策,我们设计了两个协同模块:(1) 目标条件观测合成器(GOS)基于粗粒度动作和预测的目标观测合成中间观测;(2) 交互感知动作细化器(IAAR)通过利用中间观测的反馈以及编码先前动作以确保时序一致性的历史动作记忆库,将粗略动作细化为细粒度、目标一致的动作。通过以从粗到细的方式整合目标锚定与显式的动作-观测交互,H-GAR实现了更精准的操控。在仿真和真实世界机器人操控任务上的大量实验表明,H-GAR达到了最先进的性能。

0
下载
关闭预览

相关内容

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来,这些会议吸引了来自几个国家和文化的研究人员。官网链接:http://interact2019.org/
Top
微信扫码咨询专知VIP会员