A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying distribution of the source. When data is limited, the learned models fail generalize to cases not seen during training. This paper introduces a multi-task \emph{cross-learning} framework to overcome data scarcity by jointly estimating \emph{deterministic} parameters across multiple, related tasks. We formulate this joint estimation as a constrained optimization problem, where the constraints dictate the resulting similarity between the parameters of the different models, allowing the estimated parameters to differ across tasks while still combining information from multiple data sources. This framework enables knowledge transfer from tasks with abundant data to those with scarce data, leading to more accurate and reliable parameter estimates, providing a solution for scenarios where parameter inference from limited data is critical. We provide theoretical guarantees in a controlled framework with Gaussian data, and show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases.
翻译:学习任务,即从监督数据中拟合参数模型的问题,本质上要求数据集足够大以代表源分布的总体特征。当数据有限时,学习到的模型无法泛化到训练中未见的情况。本文提出了一种多任务交叉学习框架,通过联合估计多个相关任务间的确定性参数来克服数据稀缺性。我们将此联合估计表述为一个约束优化问题,其中约束条件规定了不同模型参数之间的相似性,使得估计参数在任务间可存在差异,同时仍能整合来自多个数据源的信息。该框架能够实现从数据丰富的任务向数据稀缺的任务进行知识迁移,从而获得更准确、更可靠的参数估计,为从有限数据中推断参数至关重要的场景提供了解决方案。我们在高斯数据的受控框架下提供了理论保证,并通过图像分类和传染病传播等实际数据应用展示了交叉学习方法的有效性。