The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer. This has motivated the development of parameter-efficient tuning methods, such as learning adapter layers or visual prompt tokens, which allow a tiny portion of model parameters to be trained whereas the vast majority obtained from pre-training are frozen. However, designing a proper tuning method is non-trivial: one might need to try out a lengthy list of design choices, not to mention that each downstream dataset often requires custom designs. In this paper, we view the existing parameter-efficient tuning methods as "prompt modules" and propose Neural prOmpt seArcH (NOAH), a novel approach that learns, for large vision models, the optimal design of prompt modules through a neural architecture search algorithm, specifically for each downstream dataset. By conducting extensive experiments on over 20 vision datasets, we demonstrate that NOAH (i) is superior to individual prompt modules, (ii) has a good few-shot learning ability, and (iii) is domain-generalizable. The code and models are available at https://github.com/Davidzhangyuanhan/NOAH.
翻译:过去几年中,尤其是视野变异器出现后,视觉模型的大小急剧增长,特别是在视野变异器出现之后。这促使开发了节能调制参数的方法,例如学习适应器层或视觉提示符号,这样可以对少量模型参数进行培训,而绝大多数从训练前获得的模型则被冻结。然而,设计适当的调试方法不是三角的:也许需要尝试一个很长的设计选择清单,更不用说每个下游数据集往往需要定制设计。在本文中,我们认为现有的节能调制方法是“快速模块”,并提出了Neural Prompt seArcH(NOAH),这是一种新颖的方法,通过神经结构搜索算法,具体针对每个下游数据集,学习最优化地设计快速模块。通过对20多个视觉数据集进行广泛的实验,我们证明诺AH(i)比单个快速模块优越,(ii)有几分光的学习能力,以及(iii)是域通用的。在 https://giuthub.Davizang/Davionaivad。