从自然脚本知识中学习可转移的平时代表</s> (Learning Transferable Spatiotemporal Representations from Natural Script Knowledge) - 专知论文

会员服务 ·

0

Learning · 知识 (knowledge) · 表示 · HTTPS · Shuffle ·

2023 年 3 月 12 日

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

翻译：从自然脚本知识中学习可转移的平时代表

Ziyun Zeng,Yuying Ge,Xihui Liu,Bin Chen,Ping Luo,Shu-Tao Xia,Yixiao Ge

from arxiv, Accepted by CVPR 2023; camera-ready version

Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years. Despite some progress, existing methods are mostly limited to highly curated datasets (e.g., K400) and exhibit unsatisfactory out-of-the-box representations. We argue that it is due to the fact that they only capture pixel-level knowledge rather than spatiotemporal semantics, which hinders further progress in video understanding. Inspired by the great success of image-text pre-training (e.g., CLIP), we take the first step to exploit language semantics to boost transferable spatiotemporal representation learning. We introduce a new pretext task, Turning to Video for Transcript Sorting (TVTS), which sorts shuffled ASR scripts by attending to learned video representations. We do not rely on descriptive captions and learn purely from video, i.e., leveraging the natural transcribed speech knowledge to provide noisy but useful semantics over time. Our method enforces the vision model to contextualize what is happening over time so that it can re-organize the narrative transcripts, and can seamlessly apply to large-scale uncurated video data in the real world. Our method demonstrates strong out-of-the-box spatiotemporal representations on diverse benchmarks, e.g., +13.6% gains over VideoMAE on SSV2 via linear probing. The code is available at https://github.com/TencentARC/TVTS.

翻译：大规模视频数据培训前已成为近年来学习可转移的时空表达方式的常见秘诀。尽管取得了一些进展,但现有方法大多仅限于高度整理的数据集(如K400),并显示出不尽如人意的框外表达方式。我们争辩说,这是因为这些方法只捕捉像素级知识,而不是随机随机语义学,这阻碍了视频理解的进一步发展。受图像文本预培训(如CLIP)取得巨大成功的影响,我们迈出了第一步,利用语言语义学来推动可转移的时空表达方式学习。我们引入了一个新的借口任务,即转而转而转而进行剪辑式整理(TVTTTS),通过参加学习的视频演示,使ASR的脚本变得令人不快。我们并不依赖描述性说明,而纯粹从视频学,即利用自然流传的语音知识来提供噪音,但有用的语义学。我们的方法在时间上将正在发生的事情背景化的图像模型应用到可转移的瞬间空间表达式表达方式上,可以将我们的数据记录方式重新展示。我们现有的记录式记录记录式方法。</s>

0

相关内容

Learning

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

miR-124/EGFR信号调控胰腺祖细胞增殖机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ambra1调控细胞自噬的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Urea-SCR催化器失效机理及其故障诊断与容错控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

解析RELM-α调控中性粒细胞迁移机制对重症急性胰腺炎继发急性肺损伤的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限维Banach几何与关于凸体覆盖的Hadwiger猜想

国家自然科学基金

0+阅读 · 2012年12月31日

NO3-胁迫下黄瓜CsNMAPK介导的信号转导途径研究

国家自然科学基金

0+阅读 · 2009年12月31日

PPARg/CIITA复合物对I型胶原基因(COL1A2)的转录调控

国家自然科学基金

0+阅读 · 2008年12月31日

一种新的调控途径（Pax6-Sox2-EGFR）影响小鼠胚胎发育晚期神经干细胞的增值与迁移

国家自然科学基金

0+阅读 · 2008年12月31日

RIP140在神经元和神经胶质细胞增殖中的作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

Arxiv

0+阅读 · 2023年5月3日

Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning

Arxiv

0+阅读 · 2023年5月3日

Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge

Arxiv

0+阅读 · 2023年5月2日

Interactive Task Encoding System for Learning-from-Observation

Arxiv

0+阅读 · 2023年4月28日

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Arxiv

0+阅读 · 2023年4月27日

Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces

Arxiv

18+阅读 · 2022年11月7日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

美军AI人物介绍 | 2025年美国政府和军方五大人工智能领导者

《战略决策流程：危机管理指南》最新36页报告

持续强化学习研究综述

中文版2500字 | 人工智能如何塑造伊朗-以色列12日战争

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

Arxiv

0+阅读 · 2023年5月3日

Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning

Arxiv

0+阅读 · 2023年5月3日

Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge

Arxiv

0+阅读 · 2023年5月2日

Interactive Task Encoding System for Learning-from-Observation

Arxiv

0+阅读 · 2023年4月28日

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Arxiv

0+阅读 · 2023年4月27日

Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces

Arxiv

18+阅读 · 2022年11月7日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

miR-124/EGFR信号调控胰腺祖细胞增殖机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ambra1调控细胞自噬的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Urea-SCR催化器失效机理及其故障诊断与容错控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

解析RELM-α调控中性粒细胞迁移机制对重症急性胰腺炎继发急性肺损伤的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限维Banach几何与关于凸体覆盖的Hadwiger猜想

国家自然科学基金

0+阅读 · 2012年12月31日

NO3-胁迫下黄瓜CsNMAPK介导的信号转导途径研究

国家自然科学基金

0+阅读 · 2009年12月31日

PPARg/CIITA复合物对I型胶原基因(COL1A2)的转录调控

国家自然科学基金

0+阅读 · 2008年12月31日

一种新的调控途径（Pax6-Sox2-EGFR）影响小鼠胚胎发育晚期神经干细胞的增值与迁移

国家自然科学基金

0+阅读 · 2008年12月31日

RIP140在神经元和神经胶质细胞增殖中的作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员