迅速、生成、然后是缓存: 基金会模型的堆叠使得少见的优秀学习者</s> (Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners)

from arxiv, Accepted by CVPR 2023. Code is available at https://github.com/ZrrSkywalker/CaFo. arXiv admin note: substantial text overlap with arXiv:2209.12255

Visual recognition in low-data regimes requires deep neural networks to learn generalized representations from limited training samples. Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training. We then question, if the more diverse pre-training knowledge can be cascaded to further assist few-shot representation learning. In this paper, we propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre-training paradigms for better few-shot learning. Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge. Specifically, CaFo works by 'Prompt, Generate, then Cache'. Firstly, we leverage GPT-3 to produce textual inputs for prompting CLIP with rich downstream linguistic semantics. Then, we generate synthetic images via DALL-E to expand the few-shot training data without any manpower. At last, we introduce a learnable cache model to adaptively blend the predictions from CLIP and DINO. By such collaboration, CaFo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification. Code is available at https://github.com/ZrrSkywalker/CaFo.

翻译：低数据系统中的视觉认知要求深层的神经网络从有限的培训样本中学习通用的描述。最近,基于 CLIP 的方法显示,从对比性语言图像培训前的训练中获得了有希望的微小表现。然后,我们质问,如果更多样化的培训前知识能够逐步升级,以进一步帮助少数描述性学习。在本论文中,我们建议CaFo,这是基金会的系列模型,它包含各种培训前模式的多样化知识,以更好地进行少见的学习。然后,我们通过 DALL-E 生成合成图像,以扩大CLIP 的语言内容知识、DINO 的视觉-调频知识、DALL-E 的视觉生成知识和GPT-3 的语言生成知识。具体地说,CFo 工作“Prompt, Generate, 然后是Cachech'。我们利用GPT-3 来提供文本投入,用丰富的下游语言语义学。然后,我们通过 DALL-E 生成合成图像, 来在没有人力的情况下扩大少发式培训数据数据。最后,我们引入了“DAL-E”DL-GL-GIP-G-D-D-D-S-D-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-</s>

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日