视觉和语言导航机器人机器人系统等级跨模式代理 (Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation)

Deep Learning has revolutionized our ability to solve complex problems such as Vision-and-Language Navigation (VLN). This task requires the agent to navigate to a goal purely based on visual sensory inputs given natural language instructions. However, prior works formulate the problem as a navigation graph with a discrete action space. In this work, we lift the agent off the navigation graph and propose a more complex VLN setting in continuous 3D reconstructed environments. Our proposed setting, Robo-VLN, more closely mimics the challenges of real world navigation. Robo-VLN tasks have longer trajectory lengths, continuous action spaces, and challenges such as obstacles. We provide a suite of baselines inspired by state-of-the-art works in discrete VLN and show that they are less effective at this task. We further propose that decomposing the task into specialized high- and low-level policies can more effectively tackle this task. With extensive experiments, we show that by using layered decision making, modularized training, and decoupling reasoning and imitation, our proposed Hierarchical Cross-Modal (HCM) agent outperforms existing baselines in all key metrics and sets a new benchmark for Robo-VLN.

翻译：深层学习使我们解决视觉和语言导航(VLN)等复杂问题的能力发生了革命性的变化。这项任务要求代理人纯粹根据自然语言指令的视觉感官投入来引导一个纯粹基于视觉感官输入的目标。但是, 先前的工程将问题发展成一个带有离散动作空间的导航图。在这项工作中, 我们从导航图上提升该代理, 并提议在连续3D重建环境中建立一个更复杂的VLN设置。我们提议的设置 Robo- VLN 更接近于真实世界导航的挑战。 Robo- VLN 的任务有较长的轨道长度、持续的行动空间以及障碍等挑战。我们提供了一套由离散VLN 中的最新工程所启发的基线, 并表明它们在这个任务中效率较低。我们进一步提议, 将任务分解成专门的高低层次政策可以更有效地应对这项任务。我们通过广泛的实验, 通过使用分层决策、模块化培训、解析推理和模仿, 我们提议的Herartical Cross- Modal (HCMD) 新的基准中所有关键基准和基准基准。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日