Next Generation (NextG) networks are expected to support demanding tactile internet applications such as augmented reality and connected autonomous vehicles. Whereas recent innovations bring the promise of larger link capacity, their sensitivity to the environment and erratic performance defy traditional model-based control rationales. Zero-touch data-driven approaches can improve the ability of the network to adapt to the current operating conditions. Tools such as reinforcement learning (RL) algorithms can build optimal control policy solely based on a history of observations. Specifically, deep RL (DRL), which uses a deep neural network (DNN) as a predictor, has been shown to achieve good performance even in complex environments and with high dimensional inputs. However, the training of DRL models require a large amount of data, which may limit its adaptability to ever-evolving statistics of the underlying environment. Moreover, wireless networks are inherently distributed systems, where centralized DRL approaches would require excessive data exchange, while fully distributed approaches may result in slower convergence rates and performance degradation. In this paper, to address these challenges, we propose a federated learning (FL) approach to DRL, which we refer to federated DRL (F-DRL), where base stations (BS) collaboratively train the embedded DNN by only sharing models' weights rather than training data. We evaluate two distinct versions of F-DRL, value and policy based, and show the superior performance they achieve compared to distributed and centralized DRL.
翻译:下一代(NextG)网络预计将支持要求很高的触摸式互联网应用,如增强现实和连接的自主工具等。虽然最近的创新带来了更大的连结能力的希望,但它们对环境的敏感度和变化无常的性能与传统的基于模式的控制原理不符。零触摸数据驱动的方法可以提高网络适应当前操作条件的能力。强化学习(RL)算法等工具可以仅仅根据观察历史建立最佳控制政策。具体地说,深度RL(DRL)(DRL)(DRL)(DNN)(DNN)(DNN)(DN))(DNN)(DNN)(DNN)(DN)(DNN)(DNNN)(DNNN(DNN(DN))(DL)(DL)(DRL(DL)(DRL)(DR)(DRL)(DRL)(BR-DR)(BR)(BR-DRDR)(BR)(我们使用基于双级化数据库(BDRDRDRDR(BD(BD(BD)和双DDD(BDD)数据库(BD)数据库(BDD)(BD)(BD)(BD)(BD))(BL)(BD)(BD)(BD))(B)(B)(B)(BD(BD)(BDR)(BD)(C)(BDDR)数据库(B)数据库(B)(B)(B)(B)(B)(B)(B)(B)(B)(B)(B)(B)(BD(BD-B))(BD-BD-BD-BD)(B)(BDRDRDRDRDRDRDD)))(BD(BDA(BD)(BDDD))(BDRDD)(B)(B))))(B)(B)(BD)(BD)(B)(BD)(B)(BD(BD)(B)