Federated learning (FL) is a collaborative machine learning framework that requires different clients (e.g., Internet of Things devices) to participate in the machine learning model training process by training and uploading their local models to an FL server in each global iteration. Upon receiving the local models from all the clients, the FL server generates a global model by aggregating the received local models. This traditional FL process may suffer from the straggler problem in heterogeneous client settings, where the FL server has to wait for slow clients to upload their local models in each global iteration, thus increasing the overall training time. One of the solutions is to set up a deadline and only the clients that can upload their local models before the deadline would be selected in the FL process. This solution may lead to a slow convergence rate and global model overfitting issues due to the limited client selection. In this paper, we propose the Latency awarE Semi-synchronous client Selection and mOdel aggregation for federated learNing (LESSON) method that allows all the clients to participate in the whole FL process but with different frequencies. That is, faster clients would be scheduled to upload their models more frequently than slow clients, thus resolving the straggler problem and accelerating the convergence speed, while avoiding model overfitting. Also, LESSON is capable of adjusting the tradeoff between the model accuracy and convergence rate by varying the deadline. Extensive simulations have been conducted to compare the performance of LESSON with the other two baseline methods, i.e., FedAvg and FedCS. The simulation results demonstrate that LESSON achieves faster convergence speed than FedAvg and FedCS, and higher model accuracy than FedCS.
翻译:联邦学习(FL)是一个合作的机器学习框架,它要求不同的客户(例如,Things 的互联网设备)参加机器学习模式培训过程,在每次全球循环中培训和上传其本地模型到FL服务器。FL服务器从所有客户接收当地模型后,会生成一个全球模型,将接收到的本地模型汇总起来。这个传统的FL程序可能会因不同客户环境中的累赘问题而受到影响,因为FL服务器必须等待慢客户在每次全球循环中上传其本地模型,从而增加整个培训时间。其中一个解决办法是设定一个最后期限,只有能够在每次全球循环中选择截止日期之前上传其本地模型的客户才能上传到FL服务器服务器服务器服务器服务器上。这个解决方案可能会导致一个缓慢的趋同率和全球模型因客户选择有限而过度适应问题。在本文中,我们建议使用一个“WARE se-sy-syncronent”客户选择模型和 modelderal LearNing (LESson) 方法,让所有客户都能够参与整个 FLLL进程进程,但以不同频率的更快速的简化的升级进程,从而使客户实现更快的升级,从而实现更快的升级的客户实现更快的升级。因此使得客户实现更快的升级。因此, 更快的客户实现更快的客户在不断的升级的升级的升级的升级的升级的进度,因此, 和 malsxx的进度的进度的进度将实现更快的进度的进度的进度的进度的进度的进度的进度的进度的进度的进度的进度的进度将比和加速。