多机构强化学习的相对分布式培养和障碍避免 (Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning)

Multi-agent formation as well as obstacle avoidance is one of the most actively studied topics in the field of multi-agent systems. Although some classic controllers like model predictive control (MPC) and fuzzy control achieve a certain measure of success, most of them require precise global information which is not accessible in harsh environments. On the other hand, some reinforcement learning (RL) based approaches adopt the leader-follower structure to organize different agents' behaviors, which sacrifices the collaboration between agents thus suffering from bottlenecks in maneuverability and robustness. In this paper, we propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL). Agents in our system only utilize local and relative information to make decisions and control themselves distributively. Agent in the multi-agent system will reorganize themselves into a new topology quickly in case that any of them is disconnected. Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines (both classic control methods and another RL-based method). The feasibility of our method is verified by both simulation and hardware implementation with Ackermann-steering vehicles.

翻译：多试剂形成以及避免障碍是多试剂系统领域最积极研究的专题之一。虽然一些经典控制者,如模型预测控制(MPC)和模糊控制(MARL)取得了某种程度的成功,但大多数这类控制者需要精确的全球信息,在严酷的环境中是无法获得的。另一方面,一些基于强化学习(RL)的方法采用领导者-追随者结构来组织不同的代理行为,从而牺牲代理人之间的合作,从而在可调适性和稳健性方面遭遇瓶颈。在本文中,我们建议一种基于多试剂强化学习(MARL)的分散的形成和障碍避免方法。我们系统中的代理人仅利用当地和相对信息来作出决定并进行分配控制自己。多试剂系统中的代理人将很快重组为新的地形学,以防任何一种脱节。我们的方法在形成错误、形成趋同率和避免障碍的成功率与基线(传统的控制方法和另一种基于RL的方法)相比取得更好的表现。我们的方法的可行性通过与阿克曼飞行器的模拟和硬件实施得到验证。