跨模式不同世代的视听差异学习 (Learning Audio-Visual Correlations from Variational Cross-Modal Generation) - 专知论文

会员服务 ·

0

相关系数 · 学成 · Extensibility · 变分自编码 · INFORMS ·

2021 年 2 月 14 日

Learning Audio-Visual Correlations from Variational Cross-Modal Generation

翻译：跨模式不同世代的视听差异学习

Ye Zhu,Yu Wu,Hugo Latapie,Yi Yang,Yan Yan

from arxiv, Accepted to ICASSP 2021

People can easily imagine the potential sound while seeing an event. This natural synchronization between audio and visual signals reveals their intrinsic correlations. To this end, we propose to learn the audio-visual correlations from the perspective of cross-modal generation in a self-supervised manner, the learned correlations can be then readily applied in multiple downstream tasks such as the audio-visual cross-modal localization and retrieval. We introduce a novel Variational AutoEncoder (VAE) framework that consists of Multiple encoders and a Shared decoder (MS-VAE) with an additional Wasserstein distance constraint to tackle the problem. Extensive experiments demonstrate that the optimized latent representation of the proposed MS-VAE can effectively learn the audio-visual correlations and can be readily applied in multiple audio-visual downstream tasks to achieve competitive performance even without any given label information during training.

翻译：人们在看到一个事件时可以很容易地想象出潜在的声音。这种视听信号之间的自然同步性揭示了它们内在的关联性。为此,我们提议从跨现代一代的角度以自我监督的方式学习视听相关关系,然后可以很容易地在多种下游任务中应用所学到的关联性,例如视听跨模式本地化和检索。我们引入了一个由多个编码器和共享解码器(MS-VAE)组成的新的动态自动编码器框架,为解决这一问题增加了瓦塞尔斯坦距离限制。广泛的实验表明,拟议的MS-VAE的优化潜在代表性可以有效地学习视听相关关系,并且可以很容易地应用于多个视听下游任务中,即使没有在培训期间提供任何标签信息,也能实现竞争性业绩。

0

相关内容

相关系数

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Cross-Modal & Metric Learning 跨模态检索专题-2

Cross-Modal & Metric Learning 跨模态检索专题-2

AINLP

5+阅读 · 2020年5月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Multiple Object Tracking with Correlation Learning

Arxiv

0+阅读 · 2021年4月8日

Variational Transformer Networks for Layout Generation

Arxiv

0+阅读 · 2021年4月6日

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Arxiv

0+阅读 · 2021年4月6日

Collaborative Learning to Generate Audio-Video Jointly

Arxiv

0+阅读 · 2021年4月1日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Arxiv

5+阅读 · 2018年7月5日

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

Arxiv

5+阅读 · 2018年1月26日

Disentangled Person Image Generation

Arxiv

7+阅读 · 2018年1月21日

Semi-supervised FusedGAN for Conditional Image Generation

Arxiv

8+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

变分自编码

相关VIP内容

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

Cross-Modal & Metric Learning 跨模态检索专题-2

Cross-Modal & Metric Learning 跨模态检索专题-2

AINLP

5+阅读 · 2020年5月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

Multiple Object Tracking with Correlation Learning

Arxiv

0+阅读 · 2021年4月8日

Variational Transformer Networks for Layout Generation

Arxiv

0+阅读 · 2021年4月6日

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Arxiv

0+阅读 · 2021年4月6日

Collaborative Learning to Generate Audio-Video Jointly

Arxiv

0+阅读 · 2021年4月1日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Arxiv

5+阅读 · 2018年7月5日

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

Arxiv

5+阅读 · 2018年1月26日

Disentangled Person Image Generation

Arxiv

7+阅读 · 2018年1月21日

Semi-supervised FusedGAN for Conditional Image Generation

Arxiv

8+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员