超过5G:干预管理深度强化学习 (Cellular-Connected UAVs over 5G: Deep Reinforcement Learning for Interference Management)

In this paper, an interference-aware path planning scheme for a network of cellular-connected unmanned aerial vehicles (UAVs) is proposed. In particular, each UAV aims at achieving a tradeoff between maximizing energy efficiency and minimizing both wireless latency and the interference level caused on the ground network along its path. The problem is cast as a dynamic game among UAVs. To solve this game, a deep reinforcement learning algorithm, based on echo state network (ESN) cells, is proposed. The introduced deep ESN architecture is trained to allow each UAV to map each observation of the network state to an action, with the goal of minimizing a sequence of time-dependent utility functions. Each UAV uses ESN to learn its optimal path, transmission power level, and cell association vector at different locations along its path. The proposed algorithm is shown to reach a subgame perfect Nash equilibrium (SPNE) upon convergence. Moreover, an upper and lower bound for the altitude of the UAVs is derived thus reducing the computational complexity of the proposed algorithm. Simulation results show that the proposed scheme achieves better wireless latency per UAV and rate per ground user (UE) while requiring a number of steps that is comparable to a heuristic baseline that considers moving via the shortest distance towards the corresponding destinations. The results also show that the optimal altitude of the UAVs varies based on the ground network density and the UE data rate requirements and plays a vital role in minimizing the interference level on the ground UEs as well as the wireless transmission delay of the UAV.

翻译：在本文中,为蜂窝式无人驾驶飞行器(无人驾驶飞行器)网络提出了一个干扰感知路路规划计划。特别是,每个无人驾驶飞行器的目标是在最大限度提高能源效率和尽量减少无线潜伏以及地面网络沿其路径造成的干扰水平之间实现权衡。问题被作为无人驾驶飞行器之间的动态游戏提出。为了解决这一游戏,提出了基于回声状态网络(ESN)细胞的深度强化学习算法。引入的深 ESN 架构经过培训,使每个无人驾驶飞行器能够将网络状态的每次观测映射为行动,目标是最大限度地减少时间性干扰功能的顺序。每个无人驾驶飞行器都利用ESN学习其最佳路径、传输功率水平和沿其路径不同地点的细胞关联矢量。拟议的算法显示在趋同时达到一个亚光性完全的纳什平衡(SPNEE),此外,对无人驾驶飞行器高度的测深处和下限,因此降低了拟议算法的计算复杂性。模拟结果显示,拟议方案在网络中实现了更好的无线惯性惯性惯性度,在地面的高度水平上也显示一个最短的地面用户基线,同时显示一个最短的轨道,需要显示一个最短的地面数据。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日