Prior solutions for mitigating Byzantine failures in federated learning, such as element-wise median of the stochastic gradient descent (SGD) based updates from the clients, tend to leverage the similarity of updates from the non-Byzantine clients. However, when data is non-IID, as is typical in mobile networks, the updates received from non-Byzantine clients are quite diverse, resulting in poor convergence performance of such approaches. On the other hand, current algorithms that address heterogeneous data distribution across clients are limited in scope and do not perform well when there is variability in the number and identities of the Byzantine clients, or when general non-convex loss functions are considered. We propose `DiverseFL' that jointly addresses three key challenges of Byzantine resilient federated learning -- (i) non-IID data distribution across clients, (ii) variable Byzantine fault model, and (iii) generalization to non-convex and non-smooth optimization. DiverseFL leverages computing capability of the federated learning server that for each iteration, computes a `guiding' gradient for each client over a tiny sample of data received only once from the client before start of the training. The server uses `per client' criteria for flagging Byzantine clients, by comparing the corresponding guiding gradient with the client's gradient update. The server then updates the model using the gradients received from the non-flagged clients. As we demonstrate in our experiments with benchmark datasets and popular Byzantine attacks, our proposed approach performs better than the prior algorithms, almost matching the performance of the `Oracle SGD', where the server knows the identities of the Byzantine clients.
翻译:减少 Byzantine 校友学习失败的先前解决方案(如客户基于随机梯度下降(SGD)更新的元素中位数), 倾向于利用非Byzantine 客户更新的相似性。 但是, 当数据为非IID时, 非Byzantine 客户的更新非常多样, 导致这些方法的趋同性性能差。 另一方面, 处理客户之间数据分布不均的当前算法范围有限, 当Byzantine客户数量和身份变化时, 或当考虑一般非Convex损失功能时, 效果不佳。 我们提议“ Divelfll ”, 联合应对Byzantine弹性联邦学习的三大挑战, (一) 非IIdzantine 客户的更新数据, (二) Byzantine 错误模型, 以及 (三) 用于非cond-colfexeral 客户间数据分配的模型, 使用每家客户间直径直径的直径直径的直径直径直径直径直径直径直的服务器计算能力, 。 使用每个客户的直径直径直对客户端的服务器的服务器的校对基路路路方的变变的变变变的变的变校对基数据, 。