Communication-based multi-agent reinforcement learning (MARL) provides information exchange between agents, which promotes the cooperation. However, existing methods cannot perform well in the large-scale multi-agent system. In this paper, we adopt neighboring communication and propose a Neighboring Variational Information Flow (NVIF) to provide efficient communication for agents. It employs variational auto-encoder to compress the shared information into a latent state. This communication protocol does not rely dependently on a specific task, so that it can be pre-trained to stabilize the MARL training. Besides. we combine NVIF with Proximal Policy Optimization (NVIF-PPO) and Deep Q Network (NVIF-DQN), and present a theoretical analysis to illustrate NVIF-PPO can promote cooperation. We evaluate the NVIF-PPO and NVIF-DQN on MAgent, a widely used large-scale multi-agent environment, by two tasks with different map sizes. Experiments show that our method outperforms other compared methods, and can learn effective and scalable cooperation strategies in the large-scale multi-agent system.
翻译:以多剂通讯为基础的多剂加固学习(MARL)提供代理商之间的信息交流,促进合作,但是,现有方法在大型多剂系统中不能很好地发挥作用。在本文中,我们采用邻接通信,并提议为代理商提供高效的通信。我们使用可变自动编码器将共享信息压缩到潜伏状态。这一通信协议并不取决于具体的任务,因此可以预先培训,以稳定MARL培训。此外,我们将NVIF与优化政策(NVIF-PPPO)和深Q网络(NVIF-DQN)相结合,并提出理论分析,以说明NVIF-PO能够促进合作。我们用两种不同大小的地图任务来评估通用的大型多剂环境NVIF-PO和NVIF-DQN。实验表明,我们的方法优于其他方法,可以在大型多剂系统中学习有效和可扩展的合作战略。