卫星频率计划设计深强化学习的应用性和挑战 (Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design)

The study and benchmarking of Deep Reinforcement Learning (DRL) models has become a trend in many industries, including aerospace engineering and communications. Recent studies in these fields propose these kinds of models to address certain complex real-time decision-making problems in which classic approaches do not meet time requirements or fail to obtain optimal solutions. While the good performance of DRL models has been proved for specific use cases or scenarios, most studies do not discuss the compromises and generalizability of such models during real operations. In this paper we explore the tradeoffs of different elements of DRL models and how they might impact the final performance. To that end, we choose the Frequency Plan Design (FPD) problem in the context of multibeam satellite constellations as our use case and propose a DRL model to address it. We identify 6 different core elements that have a major effect in its performance: the policy, the policy optimizer, the state, action, and reward representations, and the training environment. We analyze different alternatives for each of these elements and characterize their effect. We also use multiple environments to account for different scenarios in which we vary the dimensionality or make the environment nonstationary. Our findings show that DRL is a potential method to address the FPD problem in real operations, especially because of its speed in decision-making. However, no single DRL model is able to outperform the rest in all scenarios, and the best approach for each of the 6 core elements depends on the features of the operation environment. While we agree on the potential of DRL to solve future complex problems in the aerospace industry, we also reflect on the importance of designing appropriate models and training procedures, understanding the applicability of such models, and reporting the main performance tradeoffs.

翻译：深加学习(DRL)模型的研究和基准制定已成为许多行业,包括航空航天工程和通信行业的一个趋势,这些领域最近的研究提出了这些类型的模型,以解决某些复杂的实时决策问题,在这些问题上,传统方法不符合时间要求,或未能找到最佳解决办法。虽然已经证明DRL模型的良好性能对具体使用案例或情景有重大影响,但大多数研究并未讨论这些模型在实际操作过程中的妥协性和可概括性。在本文件中,我们探讨了DRL模型不同要素的权衡,以及它们如何影响最后性能。为此,我们选择了多波音卫星星座背景下的频率计划设计(FPD)问题作为我们使用的例子,并提出DRL模型的模型来解决这一问题。我们确定了6个不同的核心核心要素,即政策、政策优化、状态、行动和奖励表现以及培训环境环境环境。我们用多种环境来说明这些要素的差别性能。我们还利用多种环境来说明我们如何应对6个维度或使多波音卫星星座星座中的频率设计(FPCDDRDR)问题。我们的结论显示,在实际操作中的每个潜在性能操作中,因为DRADRADR程序是如何判断方法,因此,我们无法判断中如何判断。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/