经验风险最小化(ERM)是统计学习理论中的一个原则,它定义了一系列学习算法,并用于给出其性能的理论界限。经验风险最小化的策略认为,经验风险最小的模型是最优的模型。根据这一策略,按照经验风险最小化求最优模型就是求解最优化问题。

VIP内容

基于经验风险最小化的机器学习算法由于贪心地利用训练数据之间的相关性,在分布位移下不稳定,因此泛化性能较差。近年来,利用多个训练环境来寻找非变量关系的非变量学习方法被提出用于非分布泛化(OOD)。然而,现代数据集通常是通过合并来自多个源的数据来组装的,而没有显式的源标签。由此产生的未观察到的异质性使得许多不变学习方法不适用。本文提出了异质风险最小化(HRM)框架,以实现数据之间的潜在异质性和不变关系的联合学习,从而在分布发生变化的情况下实现稳定的预测。我们从理论上描述了不变学习中环境标签的角色,并证明了我们新提出的HRM框架。大量的实验结果验证了我们的HRM的有效性。

https://arxiv.org/abs/2105.03818

成为VIP会员查看完整内容
0
8

最新论文

In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server minimizes an ERM objective function using training data that is stored across $m$ clients. In this setting, the Federated Averaging (FedAve) algorithm is the staple for determining $\epsilon$-approximate solutions to the ERM problem. Similar to standard optimization algorithms, the convergence analysis of FedAve only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the loss function, our method first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients. Then, our method solves the ERM problem at the server using inexact gradient descent. To show that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity. Under some assumptions on the loss function, e.g., strong convexity in parameter, $\eta$-H\"older smoothness in data, etc., we prove that the federated oracle complexity of FedLRGD scales like $\phi m(p/\epsilon)^{\Theta(d/\eta)}$ and that of FedAve scales like $\phi m(p/\epsilon)^{3/4}$ (neglecting sub-dominant factors), where $\phi\gg 1$ is a "communication-to-computation ratio," $p$ is the parameter dimension, and $d$ is the data dimension. Then, we show that when $d$ is small and the loss function is sufficiently smooth in the data, FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a result on low rank approximation of latent variable models.

0
0
下载
预览
参考链接
父主题
Top
微信扫码咨询专知VIP会员