The study and benchmarking of Deep Reinforcement Learning (DRL) models has become a trend in many industries, including aerospace engineering and communications. Recent studies in these fields propose these kinds of models to address certain complex real-time decision-making problems in which classic approaches do not meet time requirements or fail to obtain optimal solutions. While the good performance of DRL models has been proved for specific use cases or scenarios, most studies do not discuss the compromises and generalizability of such models during real operations. In this paper we explore the tradeoffs of different elements of DRL models and how they might impact the final performance. To that end, we choose the Frequency Plan Design (FPD) problem in the context of multibeam satellite constellations as our use case and propose a DRL model to address it. We identify 6 different core elements that have a major effect in its performance: the policy, the policy optimizer, the state, action, and reward representations, and the training environment. We analyze different alternatives for each of these elements and characterize their effect. We also use multiple environments to account for different scenarios in which we vary the dimensionality or make the environment nonstationary. Our findings show that DRL is a potential method to address the FPD problem in real operations, especially because of its speed in decision-making. However, no single DRL model is able to outperform the rest in all scenarios, and the best approach for each of the 6 core elements depends on the features of the operation environment. While we agree on the potential of DRL to solve future complex problems in the aerospace industry, we also reflect on the importance of designing appropriate models and training procedures, understanding the applicability of such models, and reporting the main performance tradeoffs.
翻译:深加学习(DRL)模型的研究和基准制定已成为许多行业,包括航空航天工程和通信行业的一个趋势,这些领域最近的研究提出了这些类型的模型,以解决某些复杂的实时决策问题,在这些问题上,传统方法不符合时间要求,或未能找到最佳解决办法。虽然已经证明DRL模型的良好性能对具体使用案例或情景有重大影响,但大多数研究并未讨论这些模型在实际操作过程中的妥协性和可概括性。在本文件中,我们探讨了DRL模型不同要素的权衡,以及它们如何影响最后性能。为此,我们选择了多波音卫星星座背景下的频率计划设计(FPD)问题作为我们使用的例子,并提出DRL模型的模型来解决这一问题。我们确定了6个不同的核心核心要素,即政策、政策优化、状态、行动和奖励表现以及培训环境环境环境。我们用多种环境来说明这些要素的差别性能。我们还利用多种环境来说明我们如何应对6个维度或使多波音卫星星座星座中的频率设计(FPCDDRDR)问题。我们的结论显示,在实际操作中的每个潜在性能操作中,因为DRADRADR程序是如何判断方法,因此,我们无法判断中如何判断。