基于注意力机制的演员-评论家策略增强多智能体协作 (Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies)

This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).

翻译：本文提出Team-Attention-Actor-Critic（TAAC），一种旨在增强合作环境中多智能体协作的强化学习算法。TAAC采用集中训练/集中执行框架，在演员网络与评论家网络中均融入多头注意力机制。该设计实现了智能体间的动态通信，使智能体能够显式查询队友状态，从而在确保高度协作的同时有效应对联合动作空间的指数级增长。我们进一步引入一种带惩罚项的损失函数，以促进智能体间形成多样且互补的角色分工。我们在模拟足球环境中将TAAC与代表其他多智能体范式的基准算法（包括近端策略优化与多智能体演员-注意力-评论家）进行对比评估。实验结果表明，TAAC在多项指标（胜率、净胜球数、Elo评分、智能体间连接度、均衡的空间分布以及频繁的战术交互如控球权交换）上均表现出更优的性能与更强的协作行为。