Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. This paves the way for stronger privacy guarantees when building predictive models. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits: (i) Clients must implement the same model architecture; (ii) Transmitting model weights and model updates implies high communication cost, which scales up with the number of model parameters; (iii) In presence of non-IID data distributions, parameter-averaging aggregation schemes perform poorly due to client model drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve and/or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we provide a review of KD-based algorithms tailored for specific FL issues.
翻译:联邦学习联合会(FL)使培训深学习模式无需集中收集可能的敏感原始数据就能进行培训,这为在建立预测模型时加强隐私保障铺平了道路;FL最常用的算法是基于参数的保护计划(例如,联邦veraging),但这种计划有众所周知的局限性:(一) 客户必须执行同样的模型结构;(二) 传输模型的权重和模式更新意味着高昂的通信成本,这要与模型参数数量相扩大;(三) 在存在非国际开发组织的数据分配的情况下,参数保护汇总计划由于客户模式的漂移而表现不佳;定期知识蒸馏(KD)的联邦调整可以解决和(或)减轻参数保护FL算法的弱点,同时可能引入其他权衡。在本篇文章中,我们审查了针对特定FL问题的基于KD的算法。