在线双双甲骨 (Online Double Oracle) - 专知论文

会员服务 ·

0

Oracle · 在线 · Learning · 评论员 · 规范化的 ·

2023 年 2 月 15 日

Online Double Oracle

翻译：在线双双甲骨

Le Cong Dinh,Yaodong Yang,Stephen McAleer,Zheng Tian,Nicolas Perez Nieves,Oliver Slumbers,David Henry Mguni,Haitham Bou Ammar,Jun Wang

from arxiv, Accepted at Transactions on Machine Learning Research (TMLR)

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods from game theory. Our method -- \emph{Online Double Oracle (ODO)} -- is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO methods, ODO is \emph{rationale} in the sense that each agent in ODO can exploit strategic adversary with a regret bound of $\mathcal{O}(\sqrt{T k \log(k)})$ where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set} that is linearly dependent on the support size of the NE. On tens of different real-world games, ODO outperforms DO, PSRO methods, and no-regret algorithms such as Multiplicative Weight Update by a significant margin, both in terms of convergence rate to a NE and average payoff against strategic adversaries.

翻译：以巨大的行动空间解决战略游戏是经济学、操作研究和人工智能方面一个关键但探索不足的主题。本文建议采用新的学习算法来解决纯战略数量惊人庞大的双玩者零和正态游戏。具体地说, 我们把在线学习的无正反分析与游戏理论的双甲骨( DO) 方法结合起来。我们的方法 -- \ emph{ 在线双甲甲( ODO) 与纳什平衡( NE) 相近。最重要的是, ODO 与正常的DO 方法不同, oDO 是 emph{ ligial }, 意思是ODO 的每个代理可以利用战略对手, 遗憾地捆绑着$\ mathcal{ O} (\\\\ qrt{ T k\ log( k)} $, 其中美元不是纯战略的总数,而是 \ emph{ 有效战略 } 的大小, 直线取决于 NEEE 的支撑大小。。。。在不同的现实游戏中, ODO eightforforforfor delfor Stal 方法上, PSerview 和O- regregildal 两种策略均值, 方法, 都以相当为等为高。

0

相关内容

Oracle

甲骨文公司，全称甲骨文股份有限公司(甲骨文软件系统有限公司)，是全球最大的企业级软件公司，总部位于美国加利福尼亚州的红木滩。1989年正式进入中国市场。2013年，甲骨文已超越 IBM ，成为继 Microsoft 后全球第二大软件公司。

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

状态空间搜索的anytime模式及其高效算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

碱金属掺杂BiCuSeO陶瓷显微结构调控及其热电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

类钙钛矿ACuB3O9型陶瓷的巨介电响应、损耗调控及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ca3Co4O9基热电材料自旋熵的物理机制及其调控

国家自然科学基金

0+阅读 · 2012年12月31日

高维Klein群的组合定理及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

有限维Banach几何与关于凸体覆盖的Hadwiger猜想

国家自然科学基金

0+阅读 · 2012年12月31日

压缩感知中采样与重建的理论及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

表面等离激元增强ZnO基器件紫外电致发光研究

国家自然科学基金

0+阅读 · 2008年12月31日

Can we learn better with hard samples?

Arxiv

0+阅读 · 2023年4月7日

Multiplication and Modulo are Lattice Linear

Arxiv

0+阅读 · 2023年4月7日

Playing Stochastically in Weighted Timed Games to Emulate Memory

Arxiv

0+阅读 · 2023年4月6日

A dynamic that evolves toward a Nash equilibrium

Arxiv

0+阅读 · 2023年4月6日

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Arxiv

0+阅读 · 2023年4月5日

List Online Classification

Arxiv

0+阅读 · 2023年4月4日

How Efficient Are Today's Continual Learning Algorithms?

Arxiv

0+阅读 · 2023年4月3日

Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

Arxiv

0+阅读 · 2023年4月2日

Investigating the complexity of the double distance problems

Arxiv

0+阅读 · 2023年4月1日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Can we learn better with hard samples?

Arxiv

0+阅读 · 2023年4月7日

Multiplication and Modulo are Lattice Linear

Arxiv

0+阅读 · 2023年4月7日

Playing Stochastically in Weighted Timed Games to Emulate Memory

Arxiv

0+阅读 · 2023年4月6日

A dynamic that evolves toward a Nash equilibrium

Arxiv

0+阅读 · 2023年4月6日

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Arxiv

0+阅读 · 2023年4月5日

List Online Classification

Arxiv

0+阅读 · 2023年4月4日

How Efficient Are Today's Continual Learning Algorithms?

Arxiv

0+阅读 · 2023年4月3日

Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

Arxiv

0+阅读 · 2023年4月2日

Investigating the complexity of the double distance problems

Arxiv

0+阅读 · 2023年4月1日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

相关基金

状态空间搜索的anytime模式及其高效算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

碱金属掺杂BiCuSeO陶瓷显微结构调控及其热电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

类钙钛矿ACuB3O9型陶瓷的巨介电响应、损耗调控及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ca3Co4O9基热电材料自旋熵的物理机制及其调控

国家自然科学基金

0+阅读 · 2012年12月31日

高维Klein群的组合定理及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

有限维Banach几何与关于凸体覆盖的Hadwiger猜想

国家自然科学基金

0+阅读 · 2012年12月31日

压缩感知中采样与重建的理论及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

表面等离激元增强ZnO基器件紫外电致发光研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员