枢纽路:从培训前模式中心学习的转移 (Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models)

Transfer learning aims to leverage knowledge from pre-trained models to benefit the target task. Prior transfer learning work mainly transfers from a single model. However, with the emergence of deep models pre-trained from different resources, model hubs consisting of diverse models with various architectures, pre-trained datasets and learning paradigms are available. Directly applying single-model transfer learning methods to each model wastes the abundant knowledge of the model hub and suffers from high computational cost. In this paper, we propose a Hub-Pathway framework to enable knowledge transfer from a model hub. The framework generates data-dependent pathway weights, based on which we assign the pathway routes at the input level to decide which pre-trained models are activated and passed through, and then set the pathway aggregation at the output level to aggregate the knowledge from different models to make predictions. The proposed framework can be trained end-to-end with the target task-specific loss, where it learns to explore better pathway configurations and exploit the knowledge in pre-trained models for each target datum. We utilize a noisy pathway generator and design an exploration loss to further explore different pathways throughout the model hub. To fully exploit the knowledge in pre-trained models, each model is further trained by specific data that activate it, which ensures its performance and enhances knowledge transfer. Experiment results on computer vision and reinforcement learning tasks demonstrate that the proposed Hub-Pathway framework achieves the state-of-the-art performance for model hub transfer learning.

翻译：先前的转移学习工作主要是从一个单一的模式中转让数据。然而,随着由不同资源预先培训的深层次模型的出现,可以建立由不同结构、经过预先培训的数据集和学习范式的不同模型组成的模型中心。对每个模型直接应用单一模式转让学习方法,浪费了对模型中心的丰富知识,并造成高昂的计算成本。在本文件中,我们提议了一个枢纽-轨道框架,以便能够从一个模型枢纽转移知识。该框架产生了依赖数据的路径权重,据此,我们分配了投入一级的路径,以确定哪些经过预先培训的模型被激活和通过,然后在产出一级设置路径集合,将不同模型的知识汇总起来作出预测。拟议的框架可以随着具体任务的损失而培训最终到最后,在其中我们学习了更好的路径配置和利用了每个目标基准的事先培训模式中的知识。我们利用了一种冷却的路径生成器,设计了一种探索损失,以进一步探索整个模型中的不同路径。充分利用了每个经过培训的路径,从而强化了每个经过培训的计算机核心的学习成果。