多玩游戏游戏中的学习 (Learning in Multi-Player Stochastic Games) - 专知论文

会员服务 ·

0

相关系数 · Learning · 纳什均衡 · 黑盒 · 情景 ·

2022 年 10 月 25 日

Learning in Multi-Player Stochastic Games

翻译：多玩游戏游戏中的学习

from arxiv, 24 pages. Presented at UAI 2021

We consider the problem of simultaneous learning in stochastic games with many players in the finite-horizon setting. While the typical target solution for a stochastic game is a Nash equilibrium, this is intractable with many players. We instead focus on variants of {\it correlated equilibria}, such as those studied for extensive-form games. We begin with a hardness result for the adversarial MDP problem: even for a horizon of 3, obtaining sublinear regret against the best non-stationary policy is \textsf{NP}-hard when both rewards and transitions are adversarial. This implies that convergence to even the weakest natural solution concept -- normal-form coarse correlated equilbrium -- is not possible via black-box reduction to a no-regret algorithm even in stochastic games with constant horizon (unless $\textsf{NP}\subseteq\textsf{BPP}$). Instead, we turn to a different target: algorithms which {\it generate} an equilibrium when they are used by all players. Our main result is algorithm which generates an {\it extensive-form} correlated equilibrium, whose runtime is exponential in the horizon but polynomial in all other parameters. We give a similar algorithm which is polynomial in all parameters for "fast-mixing" stochastic games. We also show a method for efficiently reaching normal-form coarse correlated equilibria in "single-controller" stochastic games which follows the traditional no-regret approach. When shared randomness is available, the two generative algorithms can be extended to give simultaneous regret bounds and converge in the traditional sense.

翻译：我们考虑的是与许多玩家一起在有限偏顺设置的软盘游戏中同时学习的问题。虽然随机游戏的典型目标解决方案是纳什平衡, 但对于许多玩家来说,这是棘手的。我们更关注的是 {it 相关 equilibria} 的变体, 比如那些为广度游戏而研究的变体。我们从对敌对的 MDP 问题的一个硬性结果开始: 即使是在3 的地平线上, 当奖赏和过渡都是对立的时, 获得对最佳非静止政策的亚线性遗憾是硬的。这意味着即使最弱的自然解决方案概念 -- -- 正常- 正常- 变形变色相对的 quilbrium -- 也不可能通过黑箱减为无色的算法, 即使是在恒定的游戏中( $\ textfsf{NPZ} sucolicechachabe\ text f{BPPPPD} $) 。相反, 我们转向一个不同的目标: 当所有玩家都使用时, orroralalal comal way 的方法都遵循一种平衡。我们的主要结果是“ salal liver liver ladeal lade lave, lad.

0

相关内容

相关系数

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

广义欧拉多项式的实根性

国家自然科学基金

0+阅读 · 2015年12月31日

大面积可控亚微米硅锥阵列黑硅太阳能电池基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于CRT的低复杂度LDPC不规则码构造算法及理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Rugate薄膜的高功率激光非聚焦型空间低通滤波技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

近地层自适应光学系统大气湍流补偿方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rydberg Blockade条件下的量子相干与量子信息处理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

半导体量子点与微纳金属结构表面等离激元相互作用的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于网络状态测度的多跳无线网络分布式链路调度及优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Multi-armed Bandit Learning on a Graph

Multi-armed Bandit Learning on a Graph

Arxiv

0+阅读 · 2022年12月13日

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Arxiv

0+阅读 · 2022年12月12日

Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games

Arxiv

0+阅读 · 2022年12月12日

Physics-Informed Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月12日

Satellite-based ITS Data Offloading & Computation in 6G Networks: A Cooperative Multi-Agent Proximal Policy Optimization DRL with Attention Approach

Arxiv

0+阅读 · 2022年12月12日

Generalization in Deep Learning

Arxiv

0+阅读 · 2022年12月11日

Joint Spectral Clustering in Multilayer Degree-Corrected Stochastic Blockmodels

Joint Spectral Clustering in Multilayer Degree-Corrected Stochastic Blockmodels

Arxiv

0+阅读 · 2022年12月9日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《网络安全中的机器学习算法：网络防护与攻击检测》最新报告

《美国国防部气候适应计划（2024-2027年）》52页

万字长文 | 指挥控制、战术通信、人工智能、网络战、电子战、云计算与国土安全：国际近期动态发展要闻

《美陆军网络防御作战的测试与评估》最新48页报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Multi-armed Bandit Learning on a Graph

Multi-armed Bandit Learning on a Graph

Arxiv

0+阅读 · 2022年12月13日

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Arxiv

0+阅读 · 2022年12月12日

Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games

Arxiv

0+阅读 · 2022年12月12日

Physics-Informed Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月12日

Satellite-based ITS Data Offloading & Computation in 6G Networks: A Cooperative Multi-Agent Proximal Policy Optimization DRL with Attention Approach

Arxiv

0+阅读 · 2022年12月12日

Generalization in Deep Learning

Arxiv

0+阅读 · 2022年12月11日

Joint Spectral Clustering in Multilayer Degree-Corrected Stochastic Blockmodels

Joint Spectral Clustering in Multilayer Degree-Corrected Stochastic Blockmodels

Arxiv

0+阅读 · 2022年12月9日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

相关基金

广义欧拉多项式的实根性

国家自然科学基金

0+阅读 · 2015年12月31日

大面积可控亚微米硅锥阵列黑硅太阳能电池基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于CRT的低复杂度LDPC不规则码构造算法及理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Rugate薄膜的高功率激光非聚焦型空间低通滤波技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

近地层自适应光学系统大气湍流补偿方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rydberg Blockade条件下的量子相干与量子信息处理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

半导体量子点与微纳金属结构表面等离激元相互作用的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于网络状态测度的多跳无线网络分布式链路调度及优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员