We prove that training a source model optimally for its own task is generically suboptimal when the objective is downstream transfer. We study the source-side optimization problem in L2-SP ridge regression and show a fundamental mismatch between the source-optimal and transfer-optimal source regularization: outside of a measure-zero set, $τ_0^* \neq τ_S^*$. We characterize the transfer-optimal source penalty $τ_0^*$ as a function of task alignment and identify an alignment-dependent reversal: with imperfect alignment ($0<ρ<1$), transfer benefits from stronger source regularization, while in super-aligned regimes ($ρ>1$), transfer benefits from weaker regularization. In isotropic settings, the decision of whether transfer helps is independent of the target sample size and noise, depending only on task alignment and source characteristics. We verify the linear predictions in a synthetic ridge regression experiment, and we present CIFAR-10 experiments as evidence that the source-optimal versus transfer-optimal mismatch can persist in nonlinear networks.
翻译:我们证明,当目标是下游迁移时,为自身任务最优地训练源模型通常是次优的。我们研究了L2-SP岭回归中的源侧优化问题,并揭示了源最优正则化与迁移最优源正则化之间的根本性不匹配:在测度为零的集合之外,$τ_0^* \neq τ_S^*$。我们将迁移最优源惩罚项$τ_0^*$刻画为任务对齐度的函数,并识别出一种依赖于对齐度的反转现象:在不完全对齐情况下($0<ρ<1$),迁移受益于更强的源正则化;而在超对齐机制中($ρ>1$),迁移则受益于更弱的正则化。在各向同性设置中,迁移是否有效的决策与目标样本量和噪声无关,仅取决于任务对齐度和源特征。我们在合成岭回归实验中验证了线性预测,并通过CIFAR-10实验证明源最优与迁移最优之间的不匹配现象在非线性网络中可能持续存在。