在有潜力的非零苏姆蒸汽游戏中学习 (Learning in Nonzero-Sum Stochastic Games with Potentials)

Multi-agent reinforcement learning (MARL) has become effective in tackling discrete cooperative game scenarios. However, MARL has yet to penetrate settings beyond those modelled by team and zero-sum games, confining it to a small subset of multi-agent systems. In this paper, we introduce a new generation of MARL learners that can handle nonzero-sum payoff structures and continuous settings. In particular, we study the MARL problem in a class of games known as stochastic potential games (SPGs) with continuous state-action spaces. Unlike cooperative games, in which all agents share a common reward, SPGs are capable of modelling real-world scenarios where agents seek to fulfil their individual goals. We prove theoretically our learning method, SPot-AC, enables independent agents to learn Nash equilibrium strategies in polynomial time.

翻译：多剂强化学习(MARL)在应对互不关联的合作游戏情景方面已经变得有效,然而,MARL尚未渗透到由团队和零和游戏模拟的范围之外的环境,将它局限在一小撮多剂系统上。在本文中,我们引入新一代的MARL学习者,他们能够处理非零和报酬结构和连续设置。特别是,我们在一系列称为随机潜在游戏(SPGs)的游戏中研究MARL问题,这种游戏具有持续的状态行动空间。与合作游戏不同,所有代理者都分享共同的回报,SPGs有能力模拟真实世界情景,让代理者寻求实现各自目标。我们在理论上证明了我们的学习方法,SPot-AC(SPot-AC)使独立代理者能够在多时学习什平衡战略。

相关内容

Continuity

关注 0

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日