In this paper, we identify a new phenomenon called activation-divergence which occurs in Federated Learning (FL) due to data heterogeneity (i.e., data being non-IID) across multiple users. Specifically, we argue that the activation vectors in FL can diverge, even if subsets of users share a few common classes with data residing on different devices. To address the activation-divergence issue, we introduce a prior based on the principle of maximum entropy; this prior assumes minimal information about the per-device activation vectors and aims at making the activation vectors of same classes as similar as possible across multiple devices. Our results show that, for both IID and non-IID settings, our proposed approach results in better accuracy (due to the significantly more similar activation vectors across multiple devices), and is more communication-efficient than state-of-the-art approaches in FL. Finally, we illustrate the effectiveness of our approach on a few common benchmarks and two large medical datasets.
翻译:在本文中,我们确定了在联邦学习联合会(FL)中发生的一个新现象,即由于数据差异(即数据不是IID)而导致多个用户的数据异质(即数据不是IID)而出现的激活-振动。 具体地说,我们争辩说,即使用户子集与不同设备中的数据共享几个共同类别,但FL中的激活矢量也可能不同。 为解决激活-振动问题,我们引入了一个基于最大振动-振动原则的先行方案; 在此之前,我们假设关于每个设备启动矢量的信息很少, 目的是使同一类别的激活矢量在多个设备中尽可能相似。 我们的结果显示,对于ID和非IID环境来说,我们拟议方法的结果更加准确(由于多个设备中的激活矢量非常相似),而且比FL中的最新方法更具有通信效率。 最后,我们用几个共同基准和两个大型医疗数据集来说明我们的方法的有效性。