暴露式深度强化学习的多智能体翻转目标检查 (Exposure-Based Multi-Agent Inspection of a Tumbling Target Using Deep Reinforcement Learning)

As space becomes more congested, on orbit inspection is an increasingly relevant activity whether to observe a defunct satellite for planning repairs or to de-orbit it. However, the task of on orbit inspection itself is challenging, typically requiring the careful coordination of multiple observer satellites. This is complicated by a highly nonlinear environment where the target may be unknown or moving unpredictably without time for continuous command and control from the ground. There is a need for autonomous, robust, decentralized solutions to the inspection task. To achieve this, we consider a hierarchical, learned approach for the decentralized planning of multi-agent inspection of a tumbling target. Our solution consists of two components: a viewpoint or high-level planner trained using deep reinforcement learning and a navigation planner handling point-to-point navigation between pre-specified viewpoints. We present a novel problem formulation and methodology that is suitable not only to reinforcement learning-derived robust policies, but extendable to unknown target geometries and higher fidelity information theoretic objectives received directly from sensor inputs. Operating under limited information, our trained multi-agent high-level policies successfully contextualize information within the global hierarchical environment and are correspondingly able to inspect over 90% of non-convex tumbling targets, even in the absence of additional agent attitude control.

翻译：随着空间变得更加拥挤，在轨检查变得越来越重要，无论是为了观察一个废弃卫星以规划维修，还是将其推回地球。然而，在轨检查本身是一项具有挑战性的任务，通常需要多个观测卫星的仔细协调。这在高度非线性的环境中变得复杂，其中目标可能未知或在没有时间进行连续地面控制的情况下不可预知地移动。因此，存在需要自治，稳健，分散的检查任务解决方案。为此，我们考虑了一种分层，学习的方法，用于分散规划多智能体检查翻转目标。我们的解决方案分为两个组件：通过深度强化学习训练的观点或高水平规划和处理预先指定观点之间的点到点导航的导航规划。我们提出了一种适合于强化学习的鲁棒性策略的新颖问题形式和方法，但可扩展到未知的目标几何和直接从传感器输入接收到的更高保真度的信息论目标。在有限的信息下，我们训练的多智能体高级策略成功地在全局分层环境中完成信息的上下文化，并能够检查超过90％的非凸翻转目标，即使没有附加的姿态控制。