与用于视听跨模式检索的CCA的变式自动编码器 (Variational Autoencoder with CCA for Audio-Visual Cross-Modal Retrieval) - 专知论文

会员服务 ·

0

INFORMS · 相关系数 · 变分自编码 · 自编码器 · 模态 ·

2021 年 12 月 5 日

Variational Autoencoder with CCA for Audio-Visual Cross-Modal Retrieval

翻译：与用于视听跨模式检索的CCA的变式自动编码器

Jiwei Zhang,Yi Yu,Suhua Tang,Jianming Wu,Wei Li

Cross-modal retrieval is to utilize one modality as a query to retrieve data from another modality, which has become a popular topic in information retrieval, machine learning, and database. How to effectively measure the similarity between different modality data is the major challenge of cross-modal retrieval. Although several reasearch works have calculated the correlation between different modality data via learning a common subspace representation, the encoder's ability to extract features from multi-modal information is not satisfactory. In this paper, we present a novel variational autoencoder (VAE) architecture for audio-visual cross-modal retrieval, by learning paired audio-visual correlation embedding and category correlation embedding as constraints to reinforce the mutuality of audio-visual information. On the one hand, audio encoder and visual encoder separately encode audio data and visual data into two different latent spaces. Further, two mutual latent spaces are respectively constructed by canonical correlation analysis (CCA). On the other hand, probabilistic modeling methods is used to deal with possible noise and missing information in the data. Additionally, in this way, the cross-modal discrepancy from intra-modal and inter-modal information are simultaneously eliminated in the joint embedding subspace. We conduct extensive experiments over two benchmark datasets. The experimental outcomes exhibit that the proposed architecture is effective in learning audio-visual correlation and is appreciably better than the existing cross-modal retrieval methods.

翻译：跨模式检索是使用一种方式作为查询,从另一种方式检索数据,这种方式已成为信息检索、机器学习和数据库中流行的主题。如何有效地衡量不同模式数据之间的相似性,是跨模式检索的主要挑战。虽然一些重新研究工程通过学习一个共同的子空间代表,计算了不同模式数据之间的相互关系,但编码器从多模式信息中提取特征的能力并不令人满意。在本文中,我们提出了一个新的变式自动编码器(VAE)结构,用于视听跨模式检索,其方法是学习配对的视听相关嵌入和类别相关嵌入,作为加强视听信息相互性的限制。一方面,音频编码器和视觉编码器单独将视听数据和视觉数据编码成两个不同的潜在空间。此外,两个相互潜在的空间分别由Canonic相关分析(CCA)构建。另一方面,比较式模型用于处理数据中可能存在的噪音和缺失的信息。此外,在这种方式上,跨模式的跨模式的互交式互换关联性嵌入和类别嵌入,是同时消除了内部模式和图像结构之间拟议中两个共同的互换式结果。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【ICCV2021】多层次对比学习的跨模态检索方法

【ICCV2021】多层次对比学习的跨模态检索方法

专知会员服务

23+阅读 · 2021年10月24日

【ICCV2021】模态视频表示的跨模态对比学习

专知会员服务

16+阅读 · 2021年10月4日

深度学习图像检索(CBIR): 十年之大综述

深度学习图像检索(CBIR): 十年之大综述

专知会员服务

47+阅读 · 2020年12月5日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Multi-modal data generation with a deep metric variational autoencoder

Arxiv

1+阅读 · 2022年2月7日

Open-book Video Captioning with Retrieve-Copy-Generate Network

Arxiv

7+阅读 · 2021年3月9日

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Arxiv

6+阅读 · 2020年10月12日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

Visual Data Synthesis via GAN for Zero-Shot Video Classification

Arxiv

3+阅读 · 2018年4月26日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

Video Person Re-identification by Temporal Residual Learning

Arxiv

5+阅读 · 2018年2月22日

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Arxiv

3+阅读 · 2018年2月13日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

VIP会员

文章信息

相关主题

变分自编码

相关VIP内容

【ICCV2021】多层次对比学习的跨模态检索方法

【ICCV2021】多层次对比学习的跨模态检索方法

专知会员服务

23+阅读 · 2021年10月24日

【ICCV2021】模态视频表示的跨模态对比学习

专知会员服务

16+阅读 · 2021年10月4日

深度学习图像检索(CBIR): 十年之大综述

深度学习图像检索(CBIR): 十年之大综述

专知会员服务

47+阅读 · 2020年12月5日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Multi-modal data generation with a deep metric variational autoencoder

Arxiv

1+阅读 · 2022年2月7日

Open-book Video Captioning with Retrieve-Copy-Generate Network

Arxiv

7+阅读 · 2021年3月9日

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Arxiv

6+阅读 · 2020年10月12日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

Visual Data Synthesis via GAN for Zero-Shot Video Classification

Arxiv

3+阅读 · 2018年4月26日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

Video Person Re-identification by Temporal Residual Learning

Arxiv

5+阅读 · 2018年2月22日

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Arxiv

3+阅读 · 2018年2月13日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

微信扫码咨询专知VIP会员