Current transfer learning methods for high-dimensional linear regression assume feature alignment across domains, restricting their applicability to semantically matched features. In many real-world scenarios, however, distinct features in the target and source domains can play similar predictive roles, creating a form of cross-semantic similarity. To leverage this broader transferability, we propose the Cross-Semantic Transfer Learning (CSTL) framework. It captures potential relationships by comparing each target coefficient with all source coefficients through a weighted fusion penalty. The weights are derived from the derivative of the SCAD penalty, effectively approximating an ideal weighting scheme that preserves transferable signals while filtering out source-specific noise. For computational efficiency, we implement CSTL using the Alternating Direction Method of Multipliers (ADMM). Theoretically, we establish that under mild conditions, CSTL achieves the oracle estimator with overwhelming probability. Empirical results from simulations and a real-data application confirm that CSTL outperforms existing methods in both cross-semantic and partial signal similarity settings.
翻译:当前针对高维线性回归的迁移学习方法均假设特征在域间对齐,这限制了其仅适用于语义匹配的特征场景。然而,在许多现实场景中,目标域与源域中的不同特征可能承担相似的预测作用,从而形成一种跨语义相似性。为利用这种更广泛的迁移性,我们提出了跨语义迁移学习框架。该框架通过加权融合惩罚项,将每个目标系数与所有源系数进行比较,从而捕捉潜在关联。权重来源于SCAD惩罚项的导数,能有效逼近理想加权方案——在保留可迁移信号的同时滤除源域特定噪声。为实现计算效率,我们采用交替方向乘子法实现该框架。理论分析表明,在温和条件下,该框架以压倒性概率达到Oracle估计量。仿真实验与真实数据应用均证实,在跨语义及部分信号相似性场景中,本方法均优于现有方法。