Federated Learning (FL) is an emerging machine learning framework that enables multiple clients (coordinated by a server) to collaboratively train a global model by aggregating the locally trained models without sharing any client's training data. It has been observed in recent works that learning in a federated manner may lead the aggregated global model to converge to a 'sharp minimum' thereby adversely affecting the generalizability of this FL-trained model. Therefore, in this work, we aim to improve the generalization performance of models trained in a federated setup by introducing a 'flatness' constrained FL optimization problem. This flatness constraint is imposed on the top eigenvalue of the Hessian computed from the training loss. As each client trains a model on its local data, we further re-formulate this complex problem utilizing the client loss functions and propose a new computationally efficient regularization technique, dubbed 'MAN,' which Minimizes Activation's Norm of each layer on client-side models. We also theoretically show that minimizing the activation norm reduces the top eigenvalue of the layer-wise Hessian of the client's loss, which in turn decreases the overall Hessian's top eigenvalue, ensuring convergence to a flat minimum. We apply our proposed flatness-constrained optimization to the existing FL techniques and obtain significant improvements, thereby establishing new state-of-the-art.
翻译:联邦学习(Federated Learning, FL)是一种新兴的机器学习框架,允许多个客户端(由服务器协调)通过聚合本地训练的模型来协作训练全局模型,而无需共享任何客户端的训练数据。近期研究发现,以联邦方式进行学习可能导致聚合的全局模型收敛至“尖锐最小值”,从而对FL训练模型的泛化能力产生不利影响。因此,本研究旨在通过引入“平坦性”约束的FL优化问题,提升联邦设置下训练模型的泛化性能。该平坦性约束施加于训练损失计算的Hessian矩阵的最大特征值上。由于每个客户端在其本地数据上训练模型,我们进一步利用客户端损失函数重新表述这一复杂问题,并提出一种新的计算高效的正则化技术,称为“MAN”,该技术在客户端模型上最小化每层激活的范数。我们还从理论上证明,最小化激活范数会降低客户端损失的逐层Hessian矩阵的最大特征值,进而减少整体Hessian矩阵的最大特征值,确保收敛至平坦最小值。我们将提出的平坦性约束优化应用于现有FL技术,获得了显著改进,从而确立了新的最先进水平。