零Sum Markov 游戏的自玩 Pos别处抽样算法 (A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games) - 专知论文

会员服务 ·

0

Markov · Self-Play · 样本 · 广义函数 · 频率主义学派 ·

2022 年 10 月 4 日

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

翻译：零Sum Markov 游戏的自玩 Pos别处抽样算法

Wei Xiong,Han Zhong,Chengshuai Shi,Cong Shen,Tong Zhang

from arxiv, Accepted to ICML 2022

Existing studies on provably efficient algorithms for Markov games (MGs) almost exclusively build on the "optimism in the face of uncertainty" (OFU) principle. This work focuses on a different approach of posterior sampling, which is celebrated in many bandits and reinforcement learning settings but remains under-explored for MGs. Specifically, for episodic two-player zero-sum MGs, a novel posterior sampling algorithm is developed with general function approximation. Theoretical analysis demonstrates that the posterior sampling algorithm admits a $\sqrt{T}$-regret bound for problems with a low multi-agent decoupling coefficient, which is a new complexity measure for MGs, where $T$ denotes the number of episodes. When specialized to linear MGs, the obtained regret bound matches the state-of-the-art results. To the best of our knowledge, this is the first provably efficient posterior sampling algorithm for MGs with frequentist regret guarantees, which enriches the toolbox for MGs and promotes the broad applicability of posterior sampling.

翻译：有关Markov游戏(MGs)现有有效算法的现有研究几乎完全建立在“面对不确定性的乐观”原则(OFU)的基础上。这项工作侧重于一种不同的后方取样方法,在许多强盗和强化学习环境中庆祝,但对于MGs来说仍然未得到充分探讨。具体地说,对于前两个玩家零和MGs,一种新型的后方取样算法是用一般功能近似法来开发的。理论分析表明,后方取样算法承认,对于低多试剂脱钩系数的问题,需要花费$-regret($-ret)-ret,这是对MGs的新复杂度措施,其中$T表示事件的数量。当对线型MGs专门研究时,所获的遗憾与最新技术结果相匹配。据我们所知,这是第一次以经常的遗憾保证为MGs提供的可证实有效的后方取样算算法,这丰富了MGs的工具包,并促进了海后方取样的广泛适用性。

0

相关内容

Markov

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

专知

14+阅读 · 2018年6月11日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

45+阅读 · 2015年12月31日

信息科学中图与超图划分问题的随机近似算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态规划算法的多目标检测前跟踪技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

典型约束下动态系统建模与状态估计研究

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于可加模糊行为的轮式机器人运动规划与控制

国家自然科学基金

0+阅读 · 2009年12月31日

光栅自由立体显示器观看视疲劳的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Bounded Guaranteed Algorithms for Concave Impurity Minimization Via Maximum Likelihood

Bounded Guaranteed Algorithms for Concave Impurity Minimization Via Maximum Likelihood

Arxiv

0+阅读 · 2022年11月8日

Stability estimates for the expected utility in Bayesian optimal experimental design

Arxiv

0+阅读 · 2022年11月8日

A $C^0$ Linear Finite Element Method for a Second Order Elliptic Equation in Non-Divergence Form with Cordes Coefficients

Arxiv

0+阅读 · 2022年11月8日

A Simple Algorithm for Online Decision Making

Arxiv

0+阅读 · 2022年11月8日

Lower Bounds for the Convergence of Tensor Power Iteration on Random Overcomplete Models

Arxiv

0+阅读 · 2022年11月7日

Discrete Distribution Estimation under User-level Local Differential Privacy

Discrete Distribution Estimation under User-level Local Differential Privacy

Arxiv

0+阅读 · 2022年11月7日

Sparse Gaussian Process Hyperparameters: Optimize or Integrate?

Arxiv

0+阅读 · 2022年11月4日

An approach for benchmarking the numerical solutions of stochastic compartmental models

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

VIP会员

文章信息

相关主题

频率主义学派

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】移动计算摄影的神经场表示

大语言模型遇见法律人工智能：综述

【ICCV2025】InfGen：一种分辨率无关的可扩展图像合成范式

美军用无人地面战车发展：现代战争中超越弹药的多元应用

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

专知

14+阅读 · 2018年6月11日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Bounded Guaranteed Algorithms for Concave Impurity Minimization Via Maximum Likelihood

Bounded Guaranteed Algorithms for Concave Impurity Minimization Via Maximum Likelihood

Arxiv

0+阅读 · 2022年11月8日

Stability estimates for the expected utility in Bayesian optimal experimental design

Arxiv

0+阅读 · 2022年11月8日

A $C^0$ Linear Finite Element Method for a Second Order Elliptic Equation in Non-Divergence Form with Cordes Coefficients

Arxiv

0+阅读 · 2022年11月8日

A Simple Algorithm for Online Decision Making

Arxiv

0+阅读 · 2022年11月8日

Lower Bounds for the Convergence of Tensor Power Iteration on Random Overcomplete Models

Arxiv

0+阅读 · 2022年11月7日

Discrete Distribution Estimation under User-level Local Differential Privacy

Discrete Distribution Estimation under User-level Local Differential Privacy

Arxiv

0+阅读 · 2022年11月7日

Sparse Gaussian Process Hyperparameters: Optimize or Integrate?

Arxiv

0+阅读 · 2022年11月4日

An approach for benchmarking the numerical solutions of stochastic compartmental models

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

45+阅读 · 2015年12月31日

信息科学中图与超图划分问题的随机近似算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态规划算法的多目标检测前跟踪技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

典型约束下动态系统建模与状态估计研究

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于可加模糊行为的轮式机器人运动规划与控制

国家自然科学基金

0+阅读 · 2009年12月31日

光栅自由立体显示器观看视疲劳的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员