近 Minimax 最佳离线强化学习,与线性函数近似: 单一需要的 MDP 和 Markov 游戏 (Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game) - 专知论文

会员服务 ·

0

优化器 · Performer · 线性的 · 近似 · 泛函 ·

2022 年 5 月 31 日

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

翻译：近 Minimax 最佳离线强化学习,与线性函数近似: 单一需要的 MDP 和 Markov 游戏

Wei Xiong,Han Zhong,Chengshuai Shi,Cong Shen,Liwei Wang,Tong Zhang

Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected dataset without further interactions with the environment. While various algorithms have been proposed for offline RL in the previous literature, the minimax optimal performance has only been (nearly) achieved for tabular Markov decision processes (MDPs). In this paper, we focus on offline RL with linear function approximation and propose two new algorithms, SPEVI+ and SPMVI+, for single-agent MDPs and two-player zero-sum Markov games (MGs), respectively. The proposed algorithms feature carefully crafted data splitting mechanisms and novel variance-reduction pessimistic estimators. Theoretical analysis demonstrates that they are capable of matching the performance lower bounds up to logarithmic factors. As a byproduct, a new performance lower bound is established for MGs, which tightens the existing results. To the best of our knowledge, these are the first computationally efficient and nearly minimax optimal algorithms for offline single-agent MDPs and MGs with linear function approximation.

翻译：离线强化学习(RL)旨在利用预先收集的数据集学习最佳战略,而不与环境进一步互动。虽然在以前的文献中已经为离线RL提出了各种算法,但表格Markov决定程序(MDPs)只实现了(近距离)最低最佳性能。在本文中,我们侧重于具有线性功能近似值的离线RL,并提出了两种新的算法,即SPEVI+和SPMVI+,分别用于单试MDPs和双玩者马可夫游戏(MGss)。拟议的算法具有精心制作的数据分离机制和新的差异减少悲观估测器的特点。理论分析表明,它们能够将性能较低的边框匹配到对数系数。作为副产品,为MGs设定了新的低性能约束,它收紧了现有的结果。据我们所知,这些是用于离线单一试MDPs和MGs的首次计算高效和近乎微量最佳算法。

0

相关内容

优化器

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

113+阅读 · 2019年11月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

逐段决定马氏过程的测度值生成元与可加泛函

国家自然科学基金

0+阅读 · 2014年12月31日

企业环境财务指数及其绩效牵引测度研究

国家自然科学基金

1+阅读 · 2014年12月31日

胆盐（GCDA）诱导肝癌细胞生存与耐药的信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

成体神经干细胞静息和激活的REST和miRNA17-92负反馈调控

国家自然科学基金

0+阅读 · 2012年12月31日

专利h指数与专利信息网络测度研究

国家自然科学基金

1+阅读 · 2011年12月31日

积分几何与凸几何分析

国家自然科学基金

2+阅读 · 2009年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

On stabilizing reinforcement learning without Lyapunov functions

Arxiv

0+阅读 · 2022年7月18日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

Safe reinforcement learning for multi-energy management systems with known constraint functions

Arxiv

0+阅读 · 2022年7月18日

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Arxiv

0+阅读 · 2022年7月18日

PAC Reinforcement Learning for Predictive State Representations

PAC Reinforcement Learning for Predictive State Representations

Arxiv

0+阅读 · 2022年7月15日

Approximation of Optimal Control Problems for the Navier-Stokes equation via multilinear HJB-POD

Arxiv

0+阅读 · 2022年7月15日

Making Linear MDPs Practical via Contrastive Representation Learning

Arxiv

0+阅读 · 2022年7月14日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

31+阅读 · 2022年1月11日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

21+阅读 · 2021年9月22日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

113+阅读 · 2019年11月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

On stabilizing reinforcement learning without Lyapunov functions

Arxiv

0+阅读 · 2022年7月18日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

Safe reinforcement learning for multi-energy management systems with known constraint functions

Arxiv

0+阅读 · 2022年7月18日

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Arxiv

0+阅读 · 2022年7月18日

PAC Reinforcement Learning for Predictive State Representations

PAC Reinforcement Learning for Predictive State Representations

Arxiv

0+阅读 · 2022年7月15日

Approximation of Optimal Control Problems for the Navier-Stokes equation via multilinear HJB-POD

Arxiv

0+阅读 · 2022年7月15日

Making Linear MDPs Practical via Contrastive Representation Learning

Arxiv

0+阅读 · 2022年7月14日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

31+阅读 · 2022年1月11日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

21+阅读 · 2021年9月22日

相关基金

逐段决定马氏过程的测度值生成元与可加泛函

国家自然科学基金

0+阅读 · 2014年12月31日

企业环境财务指数及其绩效牵引测度研究

国家自然科学基金

1+阅读 · 2014年12月31日

胆盐（GCDA）诱导肝癌细胞生存与耐药的信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

成体神经干细胞静息和激活的REST和miRNA17-92负反馈调控

国家自然科学基金

0+阅读 · 2012年12月31日

专利h指数与专利信息网络测度研究

国家自然科学基金

1+阅读 · 2011年12月31日

积分几何与凸几何分析

国家自然科学基金

2+阅读 · 2009年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员