Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method -- fine-tuning all parameters of the source model to the target domain -- possibly because fine-tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded by the later pretrained layers. We explore the hypothesis that these intermediate layers might be directly exploited. We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain. In evaluations on the VTAB-1k, Head2Toe matches performance obtained with fine-tuning on average while reducing training and storage cost hundred folds or more, but critically, for out-of-distribution transfer, Head2Toe outperforms fine-tuning.
翻译:转让学习方法的目的是利用在数据丰富源域上预先培训的模型,提高数据封闭目标域的性能。一个成本效益高的战略,即线性调查,涉及冻结源模式,并培训目标域的新分类头。这一战略表现超过一种成本更高但最先进的方法 -- -- 将源模式的所有参数微调到目标域 -- -- 可能是因为微调使模型能够利用中间层的有用信息,而该中间层本来被后来的预先培训层所抛弃。我们探讨这些中间层可能被直接利用的假设。我们建议一种方法,即 " 头对头检验 ",从源模式的所有层次中选择特征,为目标域培训一个分类头。在VTAB-1k的评价中,头2 Toe与平均通过微调获得的性能相匹配,同时降低培训和储存成本100倍以上,但关键地说,对于分配外转移,头2Toe的超值是微调。