Mobile edge computing (MEC) is regarded as a promising wireless access architecture to alleviate the intensive computation burden at resource limited mobile terminals (MTs). Allowing the MTs to offload partial tasks to MEC servers could significantly decrease task processing delay. In this study, to minimize the processing delay for a multi-user MEC system, we jointly optimize the local content splitting ratio, the transmission/computation power allocation, and the MEC server selection under a dynamic environment with time-varying task arrivals and wireless channels. The reinforcement learning (RL) technique is utilized to deal with the considered problem. Two deep RL strategies, that is, deep Q-learning network (DQN) and deep deterministic policy gradient (DDPG), are proposed to efficiently learn the offloading policies adaptively. The proposed DQN strategy takes the MEC selection as a unique action while using convex optimization approach to obtain the remaining variables. And the DDPG strategy takes all dynamic variables as actions. Numerical results demonstrates that both proposed strategies perform better than existing schemes. And the DDPG strategy is superior to the DQN strategy as it can learn all variables online although it requires relatively large complexity.
翻译:移动边缘计算(MEC)被认为是一个有希望的无线访问架构,可以减轻资源有限的流动终端(MTs)的密集计算负担。允许MTs向MEC服务器卸载部分任务可以大大减少任务处理延迟。在这项研究中,为了最大限度地减少多用户MEC系统的处理延迟,我们共同优化本地内容分割率、传输/计算能力分配,以及在具有时间分配任务到达和无线频道的动态环境中选择MEC服务器。强化学习(RL)技术被用来处理所考虑的问题。两个深RL战略,即深Q学习网络(DQN)和深度确定性政策梯度(DDGPG),被提议以适应的方式高效学习卸载政策。拟议的DQN战略将MEC选择作为一种独特的行动,同时使用Convex优化方法获取剩余变量。DDPG战略将所有动态变量作为行动。数值显示,拟议的战略都比现有方案效果更好。DDPG战略优于DDPG战略,即深QN战略,因为它可以学习所有复杂的在线变量。