通过视觉-语言元适应探测多模式、低时热行动 (Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation)

Few-shot (FS) and zero-shot (ZS) learning are two different approaches for scaling temporal action detection (TAD) to new classes. The former adapts a pretrained vision model to a new task represented by as few as a single video per class, whilst the latter requires no training examples by exploiting a semantic description of the new class. In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly. To tackle this problem, we further introduce a novel MUlti-modality PromPt mETa-learning (MUPPET) method. This is enabled by efficiently bridging pretrained vision and language models whilst maximally reusing already learned capacity. Concretely, we construct multi-modal prompts by mapping support videos into the textual token space of a vision-language model using a meta-learned adapter-equipped visual semantics tokenizer. To tackle large intra-class variation, we further design a query feature regulation scheme. Extensive experiments on ActivityNetv1.3 and THUMOS14 demonstrate that our MUPPET outperforms state-of-the-art alternative methods, often by a large margin. We also show that our MUPPET can be easily extended to tackle the few-shot object detection problem and again achieves the state-of-the-art performance on MS-COCO dataset. The code will be available in https://github.com/sauradip/MUPPET

翻译：少发( FS) 和零发( ZS) 学习是将时间行动探测( TAD) 放大到新班级的两种不同方法。前者将预先训练的视觉模型改造成一个新任务, 以每班只有少量的单一视频为代表, 而后者则不需要通过对新班级进行语义描述来进行训练。在这项工作中, 我们引入一个新的多式少发( MMMFFS) TAD (MMFS) 问题, 它可以被视作 FS- TAD 和 ZS- TAD 的结合, 通过联合使用少发支持视频和新类名称。为了解决这个问题, 我们进一步引入了新型 MULTA- Modality PromPt mETa- 学习( MUPETETETET) 的新模式。通过高效地连接预先训练的视野和语言模型, 并尽量重新使用已经学习过的能力。具体地, 我们通过元- AS- PPFS 的调控点/ 设置视觉标像标符空间。我们还可以在内部进行大规模的 AS- AS- ISDR AS- AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS ASU NA AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS NA NA AS AS AS NA NA NA AS NA NA NA NA NA NA NA NA NA NA AS NA NA NA NA NA NA AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS NA AS AS AS AS AS AS AS AS AS MA

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

96+阅读 · 2020年3月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日