Deep retrieval models are widely used for learning entity representations and recommendations. Federated learning provides a privacy-preserving way to train these models without requiring centralization of user data. However, federated deep retrieval models usually perform much worse than their centralized counterparts due to non-IID (independent and identically distributed) training data on clients, an intrinsic property of federated learning that limits negatives available for training. We demonstrate that this issue is distinct from the commonly studied client drift problem. This work proposes batch-insensitive losses as a way to alleviate the non-IID negatives issue for federated movie recommendation. We explore a variety of techniques and identify that batch-insensitive losses can effectively improve the performance of federated deep retrieval models, increasing the relative recall of the federated model by up to 93.15% and reducing the relative gap in recall between it and a centralized model from 27.22% - 43.14% to 0.53% - 2.42%. We open-source our code framework to accelerate further research and applications of federated deep retrieval models.
翻译:深度检索模型被广泛用于学习实体的表述和建议。 联邦学习为培训这些模型提供了一种不要求用户数据集中的隐私保护方式。 但是,联邦深层检索模型通常比中央对等模型的功能差得多,原因是关于客户的培训数据(独立和相同分布)没有国际开发公司(独立和相同分布),这是联合学习的内在特性,限制了培训的负面内容。我们证明这一问题不同于通常研究过的客户漂移问题。这项工作提出分批敏感损失,以缓解非二二维反差问题,供联合电影建议使用。我们探索了多种技术,并查明分批不敏感损失能够有效地改善联邦深层检索模型的性能,将合并模型的相对恢复率提高到93.15%,并将该模型与集中模型之间的相对差距从27.22%-43.14%到0.53%-2.42%缩小。我们开发了我们的代码框架,以加速进一步研究和应用联邦深层检索模型。