We address the general task of learning with a set of candidate models that is too large to have a uniform convergence of empirical estimates to true losses. While the common approach to such challenges is SRM (or regularization) based learning algorithms, we propose a novel learning paradigm that relies on stronger incorporation of empirical data and requires less algorithmic decisions to be based on prior assumptions. We analyze the generalization capabilities of our approach and demonstrate its merits in several common learning assumptions, including similarity of close points, clustering of the domain into highly label-homogeneous regions, Lipschitzness assumptions of the labeling rule, and contrastive learning assumptions. Our approach allows utilizing such assumptions without the need to know their true parameters a priori.
翻译:我们研究在候选模型集合过大、导致经验估计无法一致收敛至真实损失的一般性学习任务。针对此类挑战,常见方法基于结构风险最小化(或正则化)的学习算法,而本文提出一种新型学习范式,其更深入地融合经验数据,并减少依赖先验假设的算法决策。我们分析了该方法的泛化能力,并在多种常见学习假设中验证其优势,包括邻近点相似性、领域聚类为高标签同质区域、标注规则的Lipschitz连续性假设以及对比学习假设。本方法能够利用此类假设,而无需预先获知其真实参数。