This paper proposes Stochastic Geographic Gradient Fusion (SGFusion), a novel training algorithm to leverage the geographic information of mobile users in Federated Learning (FL). SGFusion maps the data collected by mobile devices onto geographical zones and trains one FL model per zone, which adapts well to the data and behaviors of users in that zone. SGFusion models the local data-based correlation among geographical zones as a hierarchical random graph (HRG) optimized by Markov Chain Monte Carlo sampling. At each training step, every zone fuses its local gradient with gradients derived from a small set of other zones sampled from the HRG. This approach enables knowledge fusion and sharing among geographical zones in a probabilistic and stochastic gradient fusion process with self-attention weights, such that "more similar" zones have "higher probabilities" of sharing gradients with "larger attention weights." SGFusion remarkably improves model utility without introducing undue computational cost. Extensive theoretical and empirical results using a heart-rate prediction dataset collected across 6 countries show that models trained with SGFusion converge with upper-bounded expected errors and significantly improve utility in all countries compared to existing approaches without notable cost in system scalability.
翻译:本文提出随机地理梯度融合(SGFusion),一种利用移动用户地理信息的新型联邦学习训练算法。SGFusion将移动设备采集的数据映射到地理区域,并为每个区域训练一个联邦学习模型,该模型能很好地适应该区域用户的数据和行为。SGFusion将地理区域间基于本地数据的相关性建模为通过马尔可夫链蒙特卡洛采样优化的层次随机图(HRG)。在每个训练步骤中,每个区域将其本地梯度与从HRG采样的少量其他区域推导出的梯度进行融合。该方法通过具有自注意力权重的概率随机梯度融合过程,实现地理区域间的知识融合与共享,使得“更相似”的区域具有“更高概率”以“更大注意力权重”共享梯度。SGFusion在未引入过高计算成本的情况下显著提升了模型效用。基于6个国家采集的心率预测数据集进行的广泛理论与实证结果表明,采用SGFusion训练的模型能以有界期望误差收敛,与现有方法相比在所有国家均显著提升效用,且未对系统可扩展性造成明显负担。