用于立方体卫星网络数据预储存和运行的数据分发和分布式和分布式机器人元加强学习(D2-RMRL) (Distributed and Distribution-Robust Meta Reinforcement Learning (D2-RMRL) for Data Pre-storing and Routing in Cube Satellite Networks)

In this paper, the problem of data pre-storing and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the satellites, such that the ground users can be directly served with the pre-stored data. This pre-storing and routing design problem is formulated as a decentralized Markov decision process (Dec-MDP) in which we seek to find the optimal strategy that maximizes the pre-store hit rate, i.e., the fraction of users being directly served with the pre-stored data. To obtain the optimal strategy, a distributed distribution-robust meta reinforcement learning (D2-RMRL) algorithm is proposed that consists of three key ingredients: value-decomposition for achieving the global optimum in distributed setting with minimum communication overhead, meta learning to obtain the optimal initial to reduce the training time under dynamic conditions, and pre-training to further speed up the meta training procedure. Simulation results show that, using the proposed value decomposition and meta training techniques, the satellite networks can achieve a 31.8% improvement of the pre-store hits and a 40.7% improvement of the convergence speed, compared to a baseline reinforcement learning algorithm. Moreover, the use of the proposed pre-training mechanism helps to shorten the meta-learning procedure by up to 43.7%.

翻译：在本文中,正在研究动态、资源受限制的立方体卫星网络的数据预储存和路由问题。在这样一个网络中,每个立方体卫星都向其覆盖的用户群提供所要求的数据。一组地面网关将为卫星提供路径和预储存某些数据,以便地面用户能够直接使用预储存的数据。这个预储存和路由设计问题是一个分散化的Markov决策程序(Dec-MDP),我们在这个进程中寻求找到最佳战略,最大限度地提高预储存率,即用户中直接使用预储存数据的比例。为了获得最佳战略,建议采用分布式-机器人元强化学习(D2-RMRML)算法,其中包括三个关键要素:用最低通信管理费分配全球最佳价值-脱钩,从元学习中获取最佳初始,以减少动态条件下的培训时间,从预培训到进一步加快元培训程序。模拟结果显示,利用拟议的价值降压前升级(D2-RMRM) 升级(DRM) 和元演算(M) 升级(M) 升级(MIT) 升级(M) 升级(M) 升级) 升级(Serview(M) 31) 升级(V) 升级(V) 升级(V) 升级(M) 升级) 升级(M) 升级(Preq) 升级(S) 升级(Sil) 升级) 升级(PL) 升级(S) 升级) 升级) 升级(P) 升级(PL) 升级(S) 升级(BAR) 升级(S) (S) (S) (SL) 升级) 升级(PL) (S) (S) (T) (SL) (S) (SL) (S) (T) (P) (M) (P) (P) (P) (P) (P) (P) (P) (P) (P) (S) (P) (P) (POL) (P) (M) (PL) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (