减去更多:从标记生成地面导航指令 (Less is More: Generating Grounded Navigation Instructions from Landmarks) - 专知论文

会员服务 ·

0

Less · SR · 可辨认的 · 多峰值 · 编码器-解码器（模型） ·

2022 年 4 月 4 日

Less is More: Generating Grounded Navigation Instructions from Landmarks

翻译：减去更多:从标记生成地面导航指令

Su Wang,Ceslee Montgomery,Jordi Orbay,Vighnesh Birodkar,Aleksandra Faust,Izzeddin Gur,Natasha Jaques,Austin Waters,Jason Baldridge,Peter Anderson

from arxiv, CVPR 2022 Camera-ready

We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. Existing generators suffer from poor visual grounding, causing them to rely on language priors and hallucinate objects. Our MARKY-MT5 system addresses this by focusing on visual landmarks; it comprises a first stage landmark detector and a second stage generator -- a multimodal, multilingual, multitask encoder-decoder. To train it, we bootstrap grounded landmark annotations on top of the Room-across-Room (RxR) dataset. Using text parsers, weak supervision from RxR's pose traces, and a multilingual image-text encoder trained on 1.8b images, we identify 971k English, Hindi and Telugu landmark descriptions and ground them to specific regions in panoramas. On Room-to-Room, human wayfinders obtain success rates (SR) of 71% following MARKY-MT5's instructions, just shy of their 75% SR following human instructions -- and well above SRs with other generators. Evaluations on RxR's longer, diverse paths obtain 61-64% SRs on three languages. Generating such high-quality navigation instructions in novel environments is a step towards conversational navigation tools and could facilitate larger-scale training of instruction-following agents.

翻译：我们研究从室内通道上捕获的360度图像自动生成导航指令。现有发电机的视觉地面定位不良, 导致它们依赖语言前科和幻觉物体。我们的MARKY- MT5系统以视觉地标为重点, 包括一个第一阶段的地标探测器和一个二级的发电机 -- -- 一个多语种的多语种多语种、多语种的解码器。为了培训它, 我们用文字解析器进行自动生成导航指令。使用文字解析器, RxR 的姿势痕迹监督不力, 以及用1.8b 图像训练的多语言图像解码器, 我们找出971k的英语、印地语和Telugu的标志性描述, 并将其放置在全美州的特定区域。在房间到 Room, 人类的选取成功率为 71% (SR), 只是在MARKY- MT5 的指令上, 他们的75% SR 的缩略语为75% -- -- 远高于其他发电机。对RxR 的姿势踪迹痕迹的变微路径进行评估, 获得61- 64 高级导航工具, 在三种语言上的高级导航工具。

0

相关内容

Less

LESS 是一个开源的样式语言，受到 Sass 的影响。严格来说，LESS 是一个嵌套的元语言，符合语法规范的 CSS 语句也是符合规范的 Less 代码。

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

专知会员服务

68+阅读 · 2022年4月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

11+阅读 · 2018年11月1日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

基于压缩感知的大尺度SAR图像三维重建及其算法的研究

国家自然科学基金

1+阅读 · 2014年12月31日

计算机辅助骨盆骨折微创内固定系统的研究

国家自然科学基金

0+阅读 · 2014年12月31日

极化层析SAR人造目标三维重构与特征提取研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于主动微波遥感数据和光学遥感数据的干旱区绿洲棉花地表多尺度土壤湿度反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向多时相腹部CT图像的多器官计算机辅助诊断关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

城市各向异性散射辐射与遮阳对建筑得热的影响

国家自然科学基金

0+阅读 · 2012年12月31日

面向CTA的计算机辅助诊断与虚拟介入治疗

国家自然科学基金

1+阅读 · 2012年12月31日

非受约束环境下双目视频监控人脸序列组配准模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于先验形状束的前列腺CT图像自动分割新方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Reinforced Structured State-Evolution for Vision-Language Navigation

Arxiv

0+阅读 · 2022年4月20日

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Arxiv

0+阅读 · 2022年4月19日

Embodied Navigation at the Art Gallery

Arxiv

1+阅读 · 2022年4月19日

Antipatterns in Software Classification Taxonomies

Antipatterns in Software Classification Taxonomies

Arxiv

0+阅读 · 2022年4月19日

Natural Language Descriptions of Deep Visual Features

Arxiv

0+阅读 · 2022年4月18日

Leveraging Language to Learn Program Abstractions and Search Heuristics

Arxiv

0+阅读 · 2022年4月18日

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters

Arxiv

0+阅读 · 2022年4月18日

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Arxiv

0+阅读 · 2022年4月16日

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

Arxiv

0+阅读 · 2022年4月15日

Building Intelligent Autonomous Navigation Agents

Arxiv

24+阅读 · 2021年6月25日

VIP会员

文章信息

相关主题

编码器-解码器（模型）

相关VIP内容

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

专知会员服务

68+阅读 · 2022年4月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

11+阅读 · 2018年11月1日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Reinforced Structured State-Evolution for Vision-Language Navigation

Arxiv

0+阅读 · 2022年4月20日

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Arxiv

0+阅读 · 2022年4月19日

Embodied Navigation at the Art Gallery

Arxiv

1+阅读 · 2022年4月19日

Antipatterns in Software Classification Taxonomies

Antipatterns in Software Classification Taxonomies

Arxiv

0+阅读 · 2022年4月19日

Natural Language Descriptions of Deep Visual Features

Arxiv

0+阅读 · 2022年4月18日

Leveraging Language to Learn Program Abstractions and Search Heuristics

Arxiv

0+阅读 · 2022年4月18日

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters

Arxiv

0+阅读 · 2022年4月18日

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Arxiv

0+阅读 · 2022年4月16日

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

Arxiv

0+阅读 · 2022年4月15日

Building Intelligent Autonomous Navigation Agents

Arxiv

24+阅读 · 2021年6月25日

相关基金

基于压缩感知的大尺度SAR图像三维重建及其算法的研究

国家自然科学基金

1+阅读 · 2014年12月31日

计算机辅助骨盆骨折微创内固定系统的研究

国家自然科学基金

0+阅读 · 2014年12月31日

极化层析SAR人造目标三维重构与特征提取研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于主动微波遥感数据和光学遥感数据的干旱区绿洲棉花地表多尺度土壤湿度反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向多时相腹部CT图像的多器官计算机辅助诊断关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

城市各向异性散射辐射与遮阳对建筑得热的影响

国家自然科学基金

0+阅读 · 2012年12月31日

面向CTA的计算机辅助诊断与虚拟介入治疗

国家自然科学基金

1+阅读 · 2012年12月31日

非受约束环境下双目视频监控人脸序列组配准模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于先验形状束的前列腺CT图像自动分割新方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员