Federated Learning is a popular approach for distributed learning due to its security and computational benefits. With the advent of powerful devices in the network edge, Gossip Learning further decentralizes Federated Learning by removing centralized integration and relying fully on peer to peer updates. However, the averaging methods generally used in both Federated and Gossip Learning are not ideal for model accuracy and global convergence. Additionally, there are few options to deploy Learning workloads in the edge as part of a larger application using a declarative approach such as Kubernetes manifests. This paper proposes Delta Sum Learning as a method to improve the basic aggregation operation in Gossip Learning, and implements it in a decentralized orchestration framework based on Open Application Model, which allows for dynamic node discovery and intent-driven deployment of multi-workload applications. Evaluation results show that Delta Sum performance is on par with alternative integration methods for 10 node topologies, but results in a 58% lower global accuracy drop when scaling to 50 nodes. Overall, it shows strong global convergence and a logarithmic loss of accuracy with increasing topology size compared to a linear loss for alternatives under limited connectivity.
翻译:联邦学习因其安全性和计算优势,已成为分布式学习的主流方法。随着网络边缘设备性能的增强,Gossip 学习通过消除中心化集成并完全依赖对等节点更新,进一步实现了联邦学习的去中心化。然而,联邦学习和 Gossip 学习中普遍采用的均值聚合方法在模型精度和全局收敛性方面并非最优。此外,目前鲜有方案能够以声明式方法(如 Kubernetes 清单)将学习任务作为大型应用的组成部分部署至边缘节点。本文提出 Delta Sum 学习作为改进 Gossip 学习基础聚合操作的方法,并基于开放应用模型实现了去中心化编排框架,该框架支持动态节点发现和面向多工作负载应用的意图驱动部署。评估结果表明:在 10 节点拓扑中,Delta Sum 的性能与现有集成方法相当;当扩展至 50 节点时,其全局精度下降幅度降低 58%。总体而言,在有限连接条件下,相较于其他方法的线性精度损失,Delta Sum 展现出更强的全局收敛能力,且精度损失随拓扑规模增大呈对数增长趋势。