视听代表学习大型数据集自动计算 (Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning)

Large-scale datasets are the cornerstone of representation learning. Existing self-supervised approaches extract learning signals by making certain assumptions about the data, e.g., spatio-temporal continuity and multimodal correspondence. However, finding large amounts of data that satisfy such assumptions is not straightforward, and this restricts the community to rely on datasets collected through laborious annotation and/or manual filtering processes. In this paper, we propose a subset optimization approach for automatic dataset curation. Focusing on audio-visual representation learning, we find a subset that provides the maximum mutual information between audio and visual channels in videos. We show that self-supervised models trained on our data, despite being automatically constructed, achieve competitive downstream performances compared to existing datasets that require annotation and/or manual filtering. The most significant benefit of our approach is scalability. We release a dataset of 100M videos with high audio-visual correspondence.

翻译：大规模数据集是代表制学习的基石。现有的自我监督方法通过对数据作出某些假设,例如时空连续性和多式通信等,来获取学习信号。然而,找到大量符合这些假设的数据并非直截了当,这限制了社区依赖通过艰苦的批注和(或)人工过滤程序收集的数据集。在本文中,我们提出了自动数据集整理的子集优化方法。侧重于视听代表制学习,我们发现一个子集,提供视频视听频道之间最大程度的相互信息。我们显示,自监督模型尽管是自动构建的,但与现有数据集相比,需要批注和(或)人工过滤的下游功能具有竞争性。我们方法的最大好处是可缩放性。我们发行了一个100M视频数据集,高声视频通信。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日