FedTP: 通过Transformer个性化实现联邦学习 (FedTP: Federated Learning by Transformer Personalization)

Federated learning is an emerging learning paradigm where multiple clients collaboratively train a machine learning model in a privacy-preserving manner. Personalized federated learning extends this paradigm to overcome heterogeneity across clients by learning personalized models. Recently, there have been some initial attempts to apply Transformers to federated learning. However, the impacts of federated learning algorithms on self-attention have not yet been studied. This paper investigates this relationship and reveals that federated averaging algorithms actually have a negative impact on self-attention where there is data heterogeneity. These impacts limit the capabilities of the Transformer model in federated learning settings. Based on this, we propose FedTP, a novel Transformer-based federated learning framework that learns personalized self-attention for each client while aggregating the other parameters among the clients. Instead of using a vanilla personalization mechanism that maintains personalized self-attention layers of each client locally, we develop a learn-to-personalize mechanism to further encourage the cooperation among clients and to increase the scablability and generalization of FedTP. Specifically, the learn-to-personalize is realized by learning a hypernetwork on the server that outputs the personalized projection matrices of self-attention layers to generate client-wise queries, keys and values. Furthermore, we present the generalization bound for FedTP with the learn-to-personalize mechanism. Notably, FedTP offers a convenient environment for performing a range of image and language tasks using the same federated network architecture - all of which benefit from Transformer personalization. Extensive experiments verify that FedTP with the learn-to-personalize mechanism yields state-of-the-art performance in non-IID scenarios. Our code is available online.

翻译：联邦学习是一种新兴的学习范式，在保护隐私的前提下，多个客户端协同训练机器学习模型。个性化联邦学习通过学习个性化模型来解决客户端之间的异构性问题。最近，一些研究已经开始将Transformer应用于联邦学习。然而，联邦平均算法对个注意力(self-attention)层的影响尚未研究。本文研究了这种关系并揭示了联邦平均算法实际上对个注意力存在负面影响，特别是在存在数据异构性的情况下。这些影响限制了Transformer模型在联邦学习环境下的能力。基于此，我们提出了FedTP，一种基于Transformer的联邦学习框架，可以在聚合其它参数的同时为每个客户端学习个性化的自注意力。我们开发了一个学习个性化的机制来鼓励客户端之间的合作，并增加了FedTP的可扩展性和泛化能力，而不是使用维护本地个性化自注意力层的简单个性化机制。具体而言，学习个性化是通过在服务器上学习超网络来实现的，该超网络输出自注意力层的个性化投影矩阵，以生成客户端特定的查询(query)，键(keys)和值(values)。此外，我们还提出了FedTP的广义上界，其用于学习个性化机制。值得注意的是，FedTP提供了一个方便的环境，可以使用相同的联邦网络架构执行一系列图像和语言任务，这些任务都受益于Transformer个性化。广泛的实验证明，使用学习个性化机制的FedTP在非IID场景下产生了最先进的性能。我们的代码已公开。

相关内容

联邦学习

关注 199

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

【CVPR 2022】基于本地正则化和稀疏化差分隐私的联邦学习，Differentially Private Federated Learning with Local Regularization and Sparsification

专知会员服务

17+阅读 · 2022年3月19日

联邦学习自然语言处理综述论文

专知会员服务

66+阅读 · 2021年8月1日

【ICML2021】基于共享表示的个性化联邦学习

专知会员服务

15+阅读 · 2021年7月21日

【IJCAI2021】基于梯度投影的联邦学习公平性算法

专知会员服务

27+阅读 · 2021年5月9日