我们能在马可夫运动会中找到纳什·艾利比里亚的线性速率吗?</s> (Can We Find Nash Equilibria at a Linear Rate in Markov Games?) - 专知论文

会员服务 ·

0

Markov · 线性的 · 基 · ENJOY · 纳什均衡 ·

2023 年 3 月 3 日

Can We Find Nash Equilibria at a Linear Rate in Markov Games?

翻译：我们能在马可夫运动会中找到纳什·艾利比里亚的线性速率吗?

Zhuoqing Song,Jason D. Lee,Zhuoran Yang

from arxiv, ICLR 2023

We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. First, the player does not need to know the policy of the opponent to update its policy. Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game. To this end, we construct a meta algorithm, dubbed as $\texttt{Homotopy-PO}$, which provably finds a Nash equilibrium at a global linear rate. In particular, $\texttt{Homotopy-PO}$ interweaves two base algorithms $\texttt{Local-Fast}$ and $\texttt{Global-Slow}$ via homotopy continuation. $\texttt{Local-Fast}$ is an algorithm that enjoys local linear convergence while $\texttt{Global-Slow}$ is an algorithm that converges globally but at a slower sublinear rate. By switching between these two base algorithms, $\texttt{Global-Slow}$ essentially serves as a ``guide'' which identifies a benign neighborhood where $\texttt{Local-Fast}$ enjoys fast convergence. However, since the exact size of such a neighborhood is unknown, we apply a doubling trick to switch between these two base algorithms. The switching scheme is delicately designed so that the aggregated performance of the algorithm is driven by $\texttt{Local-Fast}$. Furthermore, we prove that $\texttt{Local-Fast}$ and $\texttt{Global-Slow}$ can both be instantiated by variants of optimistic gradient descent/ascent (OGDA) method, which is of independent interest.

翻译：我们用两个玩家零和折扣的Markov游戏来研究分散学习,目标是为满足两个属性的任一代理商设计一个政策优化算法。首先, 玩家不需要知道对手更新其政策的政策。其次, 当两个玩家都采用算法, 他们的联合政策会与游戏的纳什平衡相融合。为此, 我们构建了一个元算法, 被称为$\ textt{ Homotopy- PO} $, 它可以在全球线性速度中找到纳什平衡。特别是, $\ tt{ Homotopy- PO} $ 的双基算法。美元- tweave 2 基算法 $ textt{ 本地- fast} $ 和 $ texttralt} 基算法, 我们的基底基底值和基底基底值的基底值是美元, 美元基底值的基底值, 美元的基底值是O的基底值, 的基底值是O 的基底值, 的基底值是基底值的基底值, 的基底的基底值, 的基底值是基底值的基底值的基底值的基底值的基值的基值的基值的基值是, 。</s>

0

相关内容

Markov

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

PDCD4在2型糖尿病心肌病胰岛素抵抗中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

补肾益气通络化浊法经PKC通路治疗糖尿病肾病的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mumford-Shah型图像分割问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同代谢特征与2型糖尿病关系的队列研究

国家自然科学基金

0+阅读 · 2012年12月31日

桔梗皂苷D对2型糖尿病小鼠降血糖作用及肝脏糖异生的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

高阶非线性波动方程

国家自然科学基金

0+阅读 · 2011年12月31日

癌痛消方对大鼠肝癌模型细胞凋亡信号传导的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

甘草素（liquiritigenin）抗肝肿瘤作用及其氧化应激机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

地方政府债务可持续性与管理制度创新研究—#8212;以云南省为例

国家自然科学基金

0+阅读 · 2009年12月31日

不同株型作物氮素组分时空分布遥感监测机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

On the Order of Power Series and the Sum of Square Roots Problem

Arxiv

0+阅读 · 2023年4月26日

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Arxiv

0+阅读 · 2023年4月25日

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Arxiv

0+阅读 · 2023年4月25日

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

Arxiv

0+阅读 · 2023年4月25日

Q-based Equilibria

Arxiv

0+阅读 · 2023年4月25日

Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems?

Arxiv

0+阅读 · 2023年4月24日

Computing the optimal error exponential function for fixed-length lossy coding in discrete memoryless sources

Arxiv

0+阅读 · 2023年4月23日

Accelerating Evolution Through Gene Masking and Distributed Search

Arxiv

0+阅读 · 2023年4月23日

Base Fee Manipulation In Ethereum's EIP-1559 Transaction Fee Mechanism

Arxiv

0+阅读 · 2023年4月22日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

On the Order of Power Series and the Sum of Square Roots Problem

Arxiv

0+阅读 · 2023年4月26日

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Arxiv

0+阅读 · 2023年4月25日

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Arxiv

0+阅读 · 2023年4月25日

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

Arxiv

0+阅读 · 2023年4月25日

Q-based Equilibria

Arxiv

0+阅读 · 2023年4月25日

Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems?

Arxiv

0+阅读 · 2023年4月24日

Computing the optimal error exponential function for fixed-length lossy coding in discrete memoryless sources

Arxiv

0+阅读 · 2023年4月23日

Accelerating Evolution Through Gene Masking and Distributed Search

Arxiv

0+阅读 · 2023年4月23日

Base Fee Manipulation In Ethereum's EIP-1559 Transaction Fee Mechanism

Arxiv

0+阅读 · 2023年4月22日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

相关基金

PDCD4在2型糖尿病心肌病胰岛素抵抗中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

补肾益气通络化浊法经PKC通路治疗糖尿病肾病的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mumford-Shah型图像分割问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同代谢特征与2型糖尿病关系的队列研究

国家自然科学基金

0+阅读 · 2012年12月31日

桔梗皂苷D对2型糖尿病小鼠降血糖作用及肝脏糖异生的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

高阶非线性波动方程

国家自然科学基金

0+阅读 · 2011年12月31日

癌痛消方对大鼠肝癌模型细胞凋亡信号传导的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

甘草素（liquiritigenin）抗肝肿瘤作用及其氧化应激机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

地方政府债务可持续性与管理制度创新研究—#8212;以云南省为例

国家自然科学基金

0+阅读 · 2009年12月31日

不同株型作物氮素组分时空分布遥感监测机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员