Locformer:使变异器能够对具有特征抽样方法的长期未剪裁视频进行时间动态定位 (LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach) - 专知论文

会员服务 ·

0

Performer · state-of-the-art · 矩 · 样本 · 归纳偏好 ·

2021 年 12 月 19 日

LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach

翻译：Locformer:使变异器能够对具有特征抽样方法的长期未剪裁视频进行时间动态定位

Cristian Rodriguez-Opazo,Edison Marrese-Taylor,Basura Fernando,Hiroya Takamura,Qi Wu

We propose LocFormer, a Transformer-based model for video grounding which operates at a constant memory footprint regardless of the video length, i.e. number of frames. LocFormer is designed for tasks where it is necessary to process the entire long video and at its core lie two main contributions. First, our model incorporates a new sampling technique that splits the input feature sequence into a fixed number of sections and selects a single feature per section using a stochastic approach, which allows us to obtain a feature sample set that is representative of the video content for the task at hand while keeping the memory footprint constant. Second, we propose a modular design that separates functionality, enabling us to learn an inductive bias via supervising the self-attention heads, while also effectively leveraging pre-trained text and video encoders. We test our proposals on relevant benchmark datasets for video grounding, showing that not only LocFormer can achieve excellent results including state-of-the-art performance on YouCookII, but also that our sampling technique is more effective than competing counterparts and that it consistently improves the performance of prior work, by up to 3.13\% in the mean temporal IoU, ultimately leading to a new state-of-the-art performance on Charades-STA.

翻译：我们提议了LocFormer, 一种基于变异器的视频定位模型, 不论视频长度, 都以恒定的记忆足迹运行, 即框架数。 LocFormer 的设计是针对需要处理整个长视频的任务设计的, 而其核心是两个主要贡献。首先, 我们的模式包含一种新的抽样技术, 将输入特征序列分成固定数量的部分, 并使用随机方法选择每个部分的单一特征, 从而使我们能够获得一个代表手头任务视频内容的特征样本集, 同时保持记忆足迹不变。其次, 我们提议了一个模块设计, 将功能分开, 使我们能够通过监管自省头来学习感性偏差, 同时有效地利用预先培训的文本和视频编码器。我们测试了我们关于相关基准数据集的建议, 用于视频定位, 显示不仅 LocFormer 能够取得极好的结果, 包括YouCookII 的状态性能, 而且我们的采样技术比竞争对手更有效, 并且它能够不断改进先前工作的性能, 通过监管自控头头的状态, 最终到 Char- 。

0

相关内容

Performer

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

Arxiv

0+阅读 · 2022年2月22日

LOTR: Face Landmark Localization Using Localization Transformer

Arxiv

0+阅读 · 2022年2月22日

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

Arxiv

0+阅读 · 2022年2月18日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

Improving Tree-LSTM with Tree Attention

Arxiv

4+阅读 · 2019年1月1日

Learning Discriminative Motion Features Through Detection

Learning Discriminative Motion Features Through Detection

Arxiv

3+阅读 · 2018年12月11日

Detect-and-Track: Efficient Pose Estimation in Videos

Arxiv

5+阅读 · 2018年5月2日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Saliency-Enhanced Robust Visual Tracking

Arxiv

6+阅读 · 2018年2月8日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】Seg4Diff：揭示文本到图像扩散 Transformer 中的开放词汇分割

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

【NTU博士论文】让语言模型成为更类人的学习者

强化学习遇见大语言模型：贯穿 LLM 生命周期的进展与应用综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

Arxiv

0+阅读 · 2022年2月22日

LOTR: Face Landmark Localization Using Localization Transformer

Arxiv

0+阅读 · 2022年2月22日

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

Arxiv

0+阅读 · 2022年2月18日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

Improving Tree-LSTM with Tree Attention

Arxiv

4+阅读 · 2019年1月1日

Learning Discriminative Motion Features Through Detection

Learning Discriminative Motion Features Through Detection

Arxiv

3+阅读 · 2018年12月11日

Detect-and-Track: Efficient Pose Estimation in Videos

Arxiv

5+阅读 · 2018年5月2日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Saliency-Enhanced Robust Visual Tracking

Arxiv

6+阅读 · 2018年2月8日

微信扫码咨询专知VIP会员