In this paper, the problem of data pre-storing and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the satellites, such that the ground users can be directly served with the pre-stored data. This pre-storing and routing design problem is formulated as a decentralized Markov decision process (Dec-MDP) in which we seek to find the optimal strategy that maximizes the pre-store hit rate, i.e., the fraction of users being directly served with the pre-stored data. To obtain the optimal strategy, a distributed distribution-robust meta reinforcement learning (D2-RMRL) algorithm is proposed that consists of three key ingredients: value-decomposition for achieving the global optimum in distributed setting with minimum communication overhead, meta learning to obtain the optimal initial to reduce the training time under dynamic conditions, and pre-training to further speed up the meta training procedure. Simulation results show that, using the proposed value decomposition and meta training techniques, the satellite networks can achieve a 31.8% improvement of the pre-store hits and a 40.7% improvement of the convergence speed, compared to a baseline reinforcement learning algorithm. Moreover, the use of the proposed pre-training mechanism helps to shorten the meta-learning procedure by up to 43.7%.
翻译:在本文中,正在研究动态、资源受限制的立方体卫星网络的数据预储存和路由问题。在这样一个网络中,每个立方体卫星都向其覆盖的用户群提供所要求的数据。一组地面网关将为卫星提供路径和预储存某些数据,以便地面用户能够直接使用预储存的数据。这个预储存和路由设计问题是一个分散化的Markov决策程序(Dec-MDP),我们在这个进程中寻求找到最佳战略,最大限度地提高预储存率,即用户中直接使用预储存数据的比例。为了获得最佳战略,建议采用分布式-机器人元强化学习(D2-RMRML)算法,其中包括三个关键要素:用最低通信管理费分配全球最佳价值-脱钩,从元学习中获取最佳初始,以减少动态条件下的培训时间,从预培训到进一步加快元培训程序。模拟结果显示,利用拟议的价值降压前升级(D2-RMRM) 升级(DRM) 和元演算(M) 升级(M) 升级(MIT) 升级(M) 升级(M) 升级) 升级(Serview(M) 31) 升级(V) 升级(V) 升级(V) 升级(M) 升级) 升级(M) 升级(Preq) 升级(S) 升级(Sil) 升级) 升级(PL) 升级(S) 升级) 升级) 升级(P) 升级(PL) 升级(S) 升级(BAR) 升级(S) (S) (S) (SL) 升级) 升级(PL) (S) (S) (T) (SL) (S) (SL) (S) (T) (P) (M) (P) (P) (P) (P) (P) (P) (P) (P) (P) (S) (P) (P) (POL) (P) (M) (PL) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (