视听探测对象本地化多任务 (Dual Normalization Multitasking for Audio-Visual Sounding Object Localization) - 专知论文

会员服务 ·

0

规范化的 · 可约的 · 数据集 · 值域 · 真实值 ·

2021 年 6 月 1 日

Dual Normalization Multitasking for Audio-Visual Sounding Object Localization

翻译：视听探测对象本地化多任务

Tokuhiro Nishikawa,Daiki Shimada,Jerry Jun Yokono

from arxiv, 10 pages, 6 figures

Although several research works have been reported on audio-visual sound source localization in unconstrained videos, no datasets and metrics have been proposed in the literature to quantitatively evaluate its performance. Defining the ground truth for sound source localization is difficult, because the location where the sound is produced is not limited to the range of the source object, but the vibrations propagate and spread through the surrounding objects. Therefore we propose a new concept, Sounding Object, to reduce the ambiguity of the visual location of sound, making it possible to annotate the location of the wide range of sound sources. With newly proposed metrics for quantitative evaluation, we formulate the problem of Audio-Visual Sounding Object Localization (AVSOL). We also created the evaluation dataset (AVSOL-E dataset) by manually annotating the test set of well-known Audio-Visual Event (AVE) dataset. To tackle this new AVSOL problem, we propose a novel multitask training strategy and architecture called Dual Normalization Multitasking (DNM), which aggregates the Audio-Visual Correspondence (AVC) task and the classification task for video events into a single audio-visual similarity map. By efficiently utilize both supervisions by DNM, our proposed architecture significantly outperforms the baseline methods.

翻译：尽管在不受限制的视频中报告了关于视听声源本地化的若干研究工作,但文献中没有提出数据集和衡量标准,以定量评价其性能。很难为声音源本地化确定地面真相,因为声音生成地点不仅限于源对象的范围,震动在周围物体中传播和传播。因此,我们提出了一个新概念,即“声音对象”,以减少声音视觉位置的模糊性,从而有可能说明各种声音源的位置。根据新的定量评价指标,我们制定了视听声学声音对象本地化问题(AVSOL)。我们还创建了评价数据集(AVSOL-E数据集),方法是手动说明众所周知的音频视频事件(AVE)数据集的测试集。为了解决这个新的AVSOL问题,我们提出了一个新的多任务培训战略和结构,称为“双正常化多任务”(DNM),将视听波斯调对象化(AVACC)任务汇总成,并大幅利用我们的拟议视听基准结构的视听任务,将类似任务和图像结构有效地利用我们的拟议的视听基准结构。

0

相关内容

规范化的

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

105+阅读 · 2020年6月10日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

6+阅读 · 2020年4月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

25+阅读 · 2020年2月16日

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

专知会员服务

25+阅读 · 2019年12月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

10+阅读 · 2020年7月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

131+阅读 · 2020年3月18日

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

7+阅读 · 2020年12月4日

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Arxiv

6+阅读 · 2020年10月12日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Few-shot Object Detection via Feature Reweighting

Arxiv

7+阅读 · 2018年12月5日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Zero-Shot Detection

Arxiv

6+阅读 · 2018年3月19日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

VIP会员

文章信息

相关主题

相关VIP内容

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

105+阅读 · 2020年6月10日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

6+阅读 · 2020年4月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

25+阅读 · 2020年2月16日

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

专知会员服务

25+阅读 · 2019年12月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

相关资讯

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

10+阅读 · 2020年7月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

131+阅读 · 2020年3月18日

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

7+阅读 · 2020年12月4日

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Arxiv

6+阅读 · 2020年10月12日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Few-shot Object Detection via Feature Reweighting

Arxiv

7+阅读 · 2018年12月5日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Zero-Shot Detection

Arxiv

6+阅读 · 2018年3月19日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

微信扫码咨询专知VIP会员