Many real-world multi-agent reinforcement learning applications require agents to communicate, assisted by a communication protocol. These applications face a common and critical issue of communication's limited bandwidth that constrains agents' ability to cooperate successfully. In this paper, rather than proposing a fixed communication protocol, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols. Our contributions are threefold. First, we notice a fact that a limited bandwidth translates into a constraint on the communicated message entropy, thus paving the way of controlling the bandwidth. Second, we introduce a customized batch-norm layer, which controls the messages' entropy to simulate the limited bandwidth constraint. Third, we apply the information bottleneck method to discover the optimal communication protocol, which can satisfy a bandwidth constraint via training with the prior distribution in the method. To demonstrate the efficacy of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks across two dimensions: the number of agents and different bandwidths. We show that IMAC converges fast, and leads to efficient communication among agents under the limited-bandwidth constraint as compared to many baseline methods.
翻译:许多现实世界多剂强化学习应用程序要求代理商在通信协议的协助下进行通信,这些应用程序面临通信有限带宽限制限制成功合作能力这一共同和关键问题。在本文件中,我们没有提出固定通信协议,而是发展了一个信息多剂通信(IMAC)方法,以学习高效通信协议。我们的贡献有三重。首先,我们注意到一个有限的带宽会影响传递信息导信的灵敏度,从而为控制带宽铺平道路。第二,我们引入一个定制的批量-中层,控制信息的导质,以模拟有限的带宽限制。第三,我们采用信息瓶颈方法发现最佳通信协议,通过培训,能够通过先前的方法分配,满足带宽限制。为了证明我们的方法的有效性,我们在多种合作和竞争性的多剂任务中进行了广泛的实验,涉及两个方面:代理商的数量和不同带宽。我们表明,IMAC快速集中,并导致在与许多基线方法相比,在有限带宽限制下的代理商之间的高效通信。