状态到状态最短时间规划与控制的同步学习 (Simultaneous learning of state-to-state minimum-time planning and control)

This paper tackles the challenge of learning a generalizable minimum-time flight policy for UAVs, capable of navigating between arbitrary start and goal states while balancing agile flight and stable hovering. Traditional approaches, particularly in autonomous drone racing, achieve impressive speeds and agility but are constrained to predefined track layouts, limiting real-world applicability. To address this, we propose a reinforcement learning-based framework that simultaneously learns state-to-state minimum-time planning and control and generalizes to arbitrary state-to-state flights. Our approach leverages Point Mass Model (PMM) trajectories as proxy rewards to approximate the true optimal flight objective and employs curriculum learning to scale the training process efficiently and to achieve generalization. We validate our method through simulation experiments, comparing it against Nonlinear Model Predictive Control (NMPC) tracking PMM-generated trajectories and conducting ablation studies to assess the impact of curriculum learning. Finally, real-world experiments confirm the robustness of our learned policy in outdoor environments, demonstrating its ability to generalize and operate on a small ARM-based single-board computer.

翻译：本文致力于解决学习一种可泛化的无人机最短时间飞行策略的挑战，该策略能够在任意起始状态和目标状态之间导航，同时兼顾敏捷飞行与稳定悬停。传统方法，特别是在自主无人机竞速领域，虽然实现了令人印象深刻的速度和敏捷性，但受限于预定义的赛道布局，限制了其在现实世界中的适用性。为解决此问题，我们提出了一种基于强化学习的框架，该框架同步学习状态到状态的最短时间规划与控制，并能泛化到任意的状态到状态飞行。我们的方法利用质点模型（PMM）轨迹作为代理奖励来逼近真实的最优飞行目标，并采用课程学习来高效扩展训练过程并实现泛化。我们通过仿真实验验证了我们的方法，将其与跟踪PMM生成轨迹的非线性模型预测控制（NMPC）进行比较，并进行了消融研究以评估课程学习的影响。最后，真实世界实验证实了我们学习到的策略在室外环境中的鲁棒性，展示了其在小型的基于ARM的单板计算机上泛化和运行的能力。