多试剂微缩奖励强化学习合作图表方法 (A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement Learning)

Multiagent reinforcement learning (MARL) can solve complex cooperative tasks. However, the efficiency of existing MARL methods relies heavily on well-defined reward functions. Multiagent tasks with sparse reward feedback are especially challenging not only because of the credit distribution problem, but also due to the low probability of obtaining positive reward feedback. In this paper, we design a graph network called Cooperation Graph (CG). The Cooperation Graph is the combination of two simple bipartite graphs, namely, the Agent Clustering subgraph (ACG) and the Cluster Designating subgraph (CDG). Next, based on this novel graph structure, we propose a Cooperation Graph Multiagent Reinforcement Learning (CG-MARL) algorithm, which can efficiently deal with the sparse reward problem in multiagent tasks. In CG-MARL, agents are directly controlled by the Cooperation Graph. And a policy neural network is trained to manipulate this Cooperation Graph, guiding agents to achieve cooperation in an implicit way. This hierarchical feature of CG-MARL provides space for customized cluster-actions, an extensible interface for introducing fundamental cooperation knowledge. In experiments, CG-MARL shows state-of-the-art performance in sparse reward multiagent benchmarks, including the anti-invasion interception task and the multi-cargo delivery task.

翻译：多剂强化学习(MARL)可以解决复杂的合作任务。但是,现有的MARL方法的效率在很大程度上依赖于明确界定的奖励功能。多剂性任务,由于信用分配问题,以及获得积极奖励反馈的可能性低,回报微弱,回报微弱的多剂性任务尤其具有挑战性。在本文中,我们设计了一个名为合作图的图表网络。合作图是两个简单的双部分图的组合,即代理集成子集成分集分集分集(ACG)和集成分集分集分集(CDG)。接着,根据这个新的图表结构,我们提议了一个合作图集多剂强化学习(CG-MARL)算法,这可以有效处理多剂性任务中稀少的奖励问题。在CG-MARL中,代理人直接受合作图的控制。一个政策神经网络受过培训,可以操纵这一合作图,指导代理人以隐含的方式实现合作。CGG-MARL的分级特征为定制的集束行动提供了空间,这是引入基本合作知识的可扩展界面。在实验中,CG-MARL显示州-MAR-CRID-C-C-BAR-C-C-C-C-C-C-C-D-BROG-C-C-DL-C-C-C-C-C-C-C-D-DRDRM-C-C-C-C-C-D-C-C-C-D-D-DR-C-C-CL-DRT-D-C-DRT-D-DR-C-C-D-C-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-DG-C-C-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D