学习语义无关和空间知觉表示以实现可推广的视听导航 (Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation) - 专知论文

会员服务 ·

0

表示 · 机器人 · 类别 · 3D场景 · 智能体 ·

2023 年 4 月 21 日

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

翻译：学习语义无关和空间知觉表示以实现可推广的视听导航

Hongcheng Wang,Yuxuan Wang,Fangwei Zhong,Mingdong Wu,Jianwei Zhang,Yizhou Wang,Hao Dong

Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound categories; 2) sample inefficient in training. Focusing on these two problems, we propose a brain-inspired plug-and-play method to learn a semantic-agnostic and spatial-aware representation for generalizable visual-audio navigation. We meticulously design two auxiliary tasks for respectively accelerating learning representations with the above-desired characteristics. With these two auxiliary tasks, the agent learns a spatially-correlated representation of visual and audio inputs that can be applied to work on environments with novel sounds and maps. Experiment results on realistic 3D scenes (Replica and Matterport3D) demonstrate that our method achieves better generalization performance when zero-shot transferred to scenes with unseen maps and unheard sound categories.

翻译：视听导航（VAN）因其在家庭机器人和救援机器人等领域的广泛应用而越来越受到机器人社区的关注。在此任务中，具有落地能力的智能体必须使用自身的视听观测来搜索并导航到声源。然而，现有方法在两个方面存在限制：1）对未听过的声音类别的推广能力差；2）训练时样本效率低。针对这两个问题，我们提出了一种灵感来自于大脑的即插即用（plug-and-play）方法，用于学习语义无关和空间知觉表示，以实现可推广的视听导航。我们精心设计了两个辅助任务，用于加速学习带有上述期望特性的表示。通过这两个辅助任务，智能体学习了视听输入的空间相关表示，可以应用于具有新声音和地图的环境。在逼真的3D场景（Replica和Matterport3D）上的实验结果表明，我们的方法在零样本转移至具有未见过的地图和声音类别的场景时实现了更好的推广性能。

0

相关内容

【ICML2022】DRIBO:基于多视图信息瓶颈的鲁棒深度强化学习

【ICML2022】DRIBO:基于多视图信息瓶颈的鲁棒深度强化学习

专知会员服务

17+阅读 · 2022年8月13日

【NUS博士论文】学习视觉场景的结构化表示，137页pdf

【NUS博士论文】学习视觉场景的结构化表示，137页pdf

专知会员服务

38+阅读 · 2022年7月15日

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

专知会员服务

12+阅读 · 2022年3月9日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

专知会员服务

24+阅读 · 2019年12月30日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

泡泡机器人SLAM

11+阅读 · 2019年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】尺度空间中具备渐进大尺度不变性的图像匹配

【泡泡一分钟】尺度空间中具备渐进大尺度不变性的图像匹配

泡泡机器人SLAM

12+阅读 · 2018年12月7日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

泡泡机器人SLAM

12+阅读 · 2018年4月17日

基于身心共融运动训练的肢体康复机器人多模态反馈方法研究

国家自然科学基金

0+阅读 · 2017年12月31日

面向跨领域异构数据的患者相似性学习方法及应用

国家自然科学基金

23+阅读 · 2016年12月31日

康复外骨骼机器人主-从无约束辅助行走训练中生物反馈信息的量化表征方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于技能匹配、学习、拓展的遥操作机器人控制研究

国家自然科学基金

2+阅读 · 2014年12月31日

先进脑机接口理论与脑控康复车实现技术研究

国家自然科学基金

5+阅读 · 2013年12月31日

致癌物NNKⅠ,Ⅱ相代谢酶CYP2A13,UGT2B17和Ⅲ相ABC转运体基因多态与肺癌的协同关联及FOXA2介导的共调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

脑机交互康复训练新技术治疗脑损伤后运动功能障碍研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于认知的视觉模式表达及其运动分析的高效计算方法

国家自然科学基金

1+阅读 · 2009年12月31日

功能磁共振成像和神经导航的微创神经外科学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Answering Compositional Queries with Set-Theoretic Embeddings

Arxiv

0+阅读 · 2023年6月7日

Style Interleaved Learning for Generalizable Person Re-identification

Arxiv

0+阅读 · 2023年6月6日

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

Arxiv

0+阅读 · 2023年6月5日

Deep Active Learning with Structured Neural Depth Search

Arxiv

0+阅读 · 2023年6月5日

CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Arxiv

0+阅读 · 2023年6月2日

On the Convergence of Coordinate Ascent Variational Inference

Arxiv

0+阅读 · 2023年6月1日

Graph Structure Learning with Variational Information Bottleneck

Arxiv

11+阅读 · 2021年12月16日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Building Intelligent Autonomous Navigation Agents

Arxiv

24+阅读 · 2021年6月25日

Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction

Arxiv

26+阅读 · 2020年12月29日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2022】DRIBO:基于多视图信息瓶颈的鲁棒深度强化学习

【ICML2022】DRIBO:基于多视图信息瓶颈的鲁棒深度强化学习

专知会员服务

17+阅读 · 2022年8月13日

【NUS博士论文】学习视觉场景的结构化表示，137页pdf

【NUS博士论文】学习视觉场景的结构化表示，137页pdf

专知会员服务

38+阅读 · 2022年7月15日

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

专知会员服务

12+阅读 · 2022年3月9日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

【NeurIPS 2019论文PPT】通过任务感知调制的多模态模型不可知论元学习（Multimodal Model Agnostic Meta-Learning via Task-Aware Modulation）

专知会员服务

24+阅读 · 2019年12月30日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

泡泡机器人SLAM

11+阅读 · 2019年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】尺度空间中具备渐进大尺度不变性的图像匹配

【泡泡一分钟】尺度空间中具备渐进大尺度不变性的图像匹配

泡泡机器人SLAM

12+阅读 · 2018年12月7日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

泡泡机器人SLAM

12+阅读 · 2018年4月17日

相关论文

Answering Compositional Queries with Set-Theoretic Embeddings

Arxiv

0+阅读 · 2023年6月7日

Style Interleaved Learning for Generalizable Person Re-identification

Arxiv

0+阅读 · 2023年6月6日

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

Arxiv

0+阅读 · 2023年6月5日

Deep Active Learning with Structured Neural Depth Search

Arxiv

0+阅读 · 2023年6月5日

CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Arxiv

0+阅读 · 2023年6月2日

On the Convergence of Coordinate Ascent Variational Inference

Arxiv

0+阅读 · 2023年6月1日

Graph Structure Learning with Variational Information Bottleneck

Arxiv

11+阅读 · 2021年12月16日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Building Intelligent Autonomous Navigation Agents

Arxiv

24+阅读 · 2021年6月25日

Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction

Arxiv

26+阅读 · 2020年12月29日

相关基金

基于身心共融运动训练的肢体康复机器人多模态反馈方法研究

国家自然科学基金

0+阅读 · 2017年12月31日

面向跨领域异构数据的患者相似性学习方法及应用

国家自然科学基金

23+阅读 · 2016年12月31日

康复外骨骼机器人主-从无约束辅助行走训练中生物反馈信息的量化表征方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于技能匹配、学习、拓展的遥操作机器人控制研究

国家自然科学基金

2+阅读 · 2014年12月31日

先进脑机接口理论与脑控康复车实现技术研究

国家自然科学基金

5+阅读 · 2013年12月31日

致癌物NNKⅠ,Ⅱ相代谢酶CYP2A13,UGT2B17和Ⅲ相ABC转运体基因多态与肺癌的协同关联及FOXA2介导的共调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

脑机交互康复训练新技术治疗脑损伤后运动功能障碍研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于认知的视觉模式表达及其运动分析的高效计算方法

国家自然科学基金

1+阅读 · 2009年12月31日

功能磁共振成像和神经导航的微创神经外科学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员