MAPPO-LCR：空间公共物品博弈中基于局部合作奖励的多智能体策略优化 (MAPPO-LCR: Multi-Agent Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games)

Spatial public goods games model collective dilemmas where individual payoffs depend on population-level strategy configurations. Most existing studies rely on evolutionary update rules or value-based reinforcement learning methods. These approaches struggle to represent payoff coupling and non-stationarity in large interacting populations. This work introduces Multi-Agent Proximal Policy Optimization (MAPPO) into spatial public goods games for the first time. In these games, individual returns are intrinsically coupled through overlapping group interactions. Proximal Policy Optimization (PPO) treats agents as independent learners and ignores this coupling during value estimation. MAPPO addresses this limitation through a centralized critic that evaluates joint strategy configurations. To study neighborhood-level cooperation signals under this framework, we propose MAPPO with Local Cooperation Reward, termed MAPPO-LCR. The local cooperation reward aligns policy updates with surrounding cooperative density without altering the original game structure. MAPPO-LCR preserves decentralized execution while enabling population-level value estimation during training. Extensive simulations demonstrate stable cooperation emergence and reliable convergence across enhancement factors. Statistical analyses further confirm the learning advantage of MAPPO over PPO in spatial public goods games.

翻译：空间公共物品博弈模拟了个体收益依赖于群体层面策略配置的集体困境。现有研究大多依赖于演化更新规则或基于价值的强化学习方法。这些方法难以表征大规模交互群体中的收益耦合与非平稳性。本研究首次将多智能体近端策略优化引入空间公共物品博弈。在此类博弈中，个体回报通过重叠的群体交互实现内在耦合。近端策略优化将智能体视为独立学习者，在价值估计时忽略这种耦合关系。MAPPO通过集中式评价器评估联合策略配置来解决此局限性。为研究该框架下邻域层面的合作信号，我们提出带有局部合作奖励的MAPPO，称为MAPPO-LCR。局部合作奖励使策略更新与周边合作密度保持一致，同时保持原始博弈结构不变。MAPPO-LCR在保持分散式执行的同时，实现了训练期间群体层面的价值估计。大量仿真实验表明，该方法能在不同增强因子下实现稳定合作涌现与可靠收敛。统计分析进一步证实了MAPPO在空间公共物品博弈中相较于PPO的学习优势。