学习用于愿景和语言导航的未标 3D 环境 (Learning from Unlabeled 3D Environments for Vision-and-Language Navigation) - 专知论文

会员服务 ·

0

回合 · 未标记 · 泛化理论 · Learning · 3D ·

2022 年 8 月 24 日

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation

翻译：学习用于愿景和语言导航的未标 3D 环境

Shizhe Chen,Pierre-Louis Guhur,Makarand Tapaswi,Cordelia Schmid,Ivan Laptev

from arxiv, ECCV 2022

In vision-and-language navigation (VLN), an embodied agent is required to navigate in realistic 3D environments following natural language instructions. One major bottleneck for existing VLN approaches is the lack of sufficient training data, resulting in unsatisfactory generalization to unseen environments. While VLN data is typically collected manually, such an approach is expensive and prevents scalability. In this work, we address the data scarcity issue by proposing to automatically create a large-scale VLN dataset from 900 unlabeled 3D buildings from HM3D. We generate a navigation graph for each building and transfer object predictions from 2D to generate pseudo 3D object labels by cross-view consistency. We then fine-tune a pretrained language model using pseudo object labels as prompts to alleviate the cross-modal gap in instruction generation. Our resulting HM3D-AutoVLN dataset is an order of magnitude larger than existing VLN datasets in terms of navigation environments and instructions. We experimentally demonstrate that HM3D-AutoVLN significantly increases the generalization ability of resulting VLN models. On the SPL metric, our approach improves over state of the art by 7.1% and 8.1% on the unseen validation splits of REVERIE and SOON datasets respectively.

翻译：在视觉和语言导航(VLN)中,根据自然语言指令,在现实的 3D 环境中导航需要一个内装剂。现有的 VLN 方法的一个主要瓶颈是缺乏足够的培训数据,导致无法令人满意地概括到不可见的环境。虽然VLN 数据通常是手工收集的,但这种方法费用昂贵,防止了可缩放性。在这项工作中,我们建议从来自 HM3D 的900个未标记的 3D 建筑物自动创建大型 VLN 数据集,以解决数据稀缺问题。我们为每座建筑制作一个导航图,并将天体的预测从 2D 生成假的 3D 对象标签,以便通过交叉视图一致性生成假的3D 对象标签。然后我们微调一种预先训练的语言模型,使用假对象标签来缓解教学生成中的跨模式差距。我们由此生成的 HM3D-AutVLN 数据集在导航环境和指示方面比现有的VLN 数据集大得多。我们实验性地证明,HM3D-AutVLN 将大大提高VLN 模型的通用能力,从而获得VLN ISON 的VIAL 的VIAS 的VI 和SAL IPIAS IP IP 。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

补体C3a在心肌梗死后心脏重塑中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

SDF-1/CXCR7轴在3D培养的间充质干细胞向缺血心肌迁徙中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

复杂航天器密频挠性结构的在轨辨识和振动控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于云计算的3D地震勘探专用GPS定位方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

质量谐振器的若干反问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

细胞衰老相关基因遗传变异和miRNAs与头颈部鳞癌发病风险的研究

国家自然科学基金

0+阅读 · 2012年12月31日

时滞耦合振动系统振动抑制理论与实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

保守振动方程周期解的存在性研究

国家自然科学基金

0+阅读 · 2012年12月31日

概念车身框架结构有限元精细建模与截面几何形状优化设计

国家自然科学基金

0+阅读 · 2012年12月31日

多自由度哈密顿系统的动力学不稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Arxiv

0+阅读 · 2022年10月6日

Iterative Vision-and-Language Navigation

Arxiv

0+阅读 · 2022年10月6日

Making Your First Choice: To Address Cold Start Problem in Vision Active Learning

Arxiv

0+阅读 · 2022年10月5日

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos

Arxiv

0+阅读 · 2022年10月5日

One Transformer Can Understand Both 2D & 3D Molecular Data

Arxiv

0+阅读 · 2022年10月4日

Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality

Arxiv

1+阅读 · 2022年10月4日

Gradient Gating for Deep Multi-Rate Learning on Graphs

Arxiv

0+阅读 · 2022年10月2日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

相关论文

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Arxiv

0+阅读 · 2022年10月6日

Iterative Vision-and-Language Navigation

Arxiv

0+阅读 · 2022年10月6日

Making Your First Choice: To Address Cold Start Problem in Vision Active Learning

Arxiv

0+阅读 · 2022年10月5日

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos

Arxiv

0+阅读 · 2022年10月5日

One Transformer Can Understand Both 2D & 3D Molecular Data

Arxiv

0+阅读 · 2022年10月4日

Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality

Arxiv

1+阅读 · 2022年10月4日

Gradient Gating for Deep Multi-Rate Learning on Graphs

Arxiv

0+阅读 · 2022年10月2日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

补体C3a在心肌梗死后心脏重塑中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

SDF-1/CXCR7轴在3D培养的间充质干细胞向缺血心肌迁徙中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

复杂航天器密频挠性结构的在轨辨识和振动控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于云计算的3D地震勘探专用GPS定位方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

质量谐振器的若干反问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

细胞衰老相关基因遗传变异和miRNAs与头颈部鳞癌发病风险的研究

国家自然科学基金

0+阅读 · 2012年12月31日

时滞耦合振动系统振动抑制理论与实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

保守振动方程周期解的存在性研究

国家自然科学基金

0+阅读 · 2012年12月31日

概念车身框架结构有限元精细建模与截面几何形状优化设计

国家自然科学基金

0+阅读 · 2012年12月31日

多自由度哈密顿系统的动力学不稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员