折扣强化学习中的抽样和估计之旅 (A Tale of Sampling and Estimation in Discounted Reinforcement Learning) - 专知论文

会员服务 ·

0

马尔可夫过程 · 时序 · 均值 · 强化学习 · 平稳分布 ·

2023 年 4 月 11 日

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

翻译：折扣强化学习中的抽样和估计之旅

Alberto Maria Metelli,Mirco Mutti,Marcello Restelli

from arxiv, AISTATS 2023

The most relevant problems in discounted reinforcement learning involve estimating the mean of a function under the stationary distribution of a Markov reward process, such as the expected return in policy evaluation, or the policy gradient in policy optimization. In practice, these estimates are produced through a finite-horizon episodic sampling, which neglects the mixing properties of the Markov process. It is mostly unclear how this mismatch between the practical and the ideal setting affects the estimation, and the literature lacks a formal study on the pitfalls of episodic sampling, and how to do it optimally. In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor. Then, we provide a statistical analysis on a set of notable estimators and the corresponding sampling procedures, which includes the finite-horizon estimators often used in practice. Crucially, we show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties w.r.t. the alternative estimators, as it matches the lower bound without requiring a careful tuning of the episode horizon.

翻译：折扣强化学习中最相关的问题涉及在马尔可夫奖励过程的平稳分布下估计函数的平均值，例如策略评估中的预期回报或策略优化中的策略梯度。在实践中，这些估计是通过有限的时序抽样产生的，它忽略了马尔可夫过程的混合属性。目前尚不清楚这种实践和理想设置之间的不匹配如何影响估计，并且文献缺乏关于时序抽样的缺陷及其如何最优地实现的形式化研究。在本文中，我们提供了一个关于折扣均值估计问题的最小最大下界，明确将估计误差与马尔可夫过程的混合属性和折扣因子联系起来。然后，我们对一些著名的估计器及其相应的抽样过程进行了统计分析，其中包括实践中经常使用的有限时序估计器。重要的是，我们展示了通过直接从马尔可夫过程的折扣内核进行抽样来估计均值比替代估计器带来了更强大的统计特性，因为它无需精细调整时间步数就可以匹配下限。

0

相关内容

马尔可夫过程

马尔可夫过程

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML2020】强化学习中基于模型的方法，279页ppt

【ICML2020】强化学习中基于模型的方法，279页ppt

专知会员服务

47+阅读 · 2020年10月26日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

14+阅读 · 2019年11月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于非独立同分布样本的统计学习理论研究与应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于复合分位数回归和最大秩相关想法的ROC回归曲线估计

国家自然科学基金

0+阅读 · 2013年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

统计学习理论中的分位数回归和MEE算法

国家自然科学基金

1+阅读 · 2012年12月31日

贝叶斯离散分位数回归模型：理论，方法及应用

国家自然科学基金

0+阅读 · 2012年12月31日

递推局部多项式回归估计及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

有色噪声下基于噪声约束最小均方估计的语音增强算法

国家自然科学基金

0+阅读 · 2011年12月31日

演化和蚁群算法的近似性能分析

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Arxiv

0+阅读 · 2023年5月29日

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Arxiv

0+阅读 · 2023年5月27日

A Simulation Environment and Reinforcement Learning Method for Waste Reduction

Arxiv

0+阅读 · 2023年5月26日

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

Arxiv

0+阅读 · 2023年5月26日

GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年5月25日

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Arxiv

0+阅读 · 2023年5月24日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

VIP会员

文章信息

相关主题

马尔可夫过程

相关VIP内容

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML2020】强化学习中基于模型的方法，279页ppt

【ICML2020】强化学习中基于模型的方法，279页ppt

专知会员服务

47+阅读 · 2020年10月26日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

14+阅读 · 2019年11月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中俄罗斯两栖作战能力：黑海舰队战力与作战失利研究》2025年最新111页

《任务式指挥十六个案例研究》232页

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

《美军条令：小部队指挥官山地作战指南》最新238页

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Arxiv

0+阅读 · 2023年5月29日

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Arxiv

0+阅读 · 2023年5月27日

A Simulation Environment and Reinforcement Learning Method for Waste Reduction

Arxiv

0+阅读 · 2023年5月26日

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

Arxiv

0+阅读 · 2023年5月26日

GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年5月25日

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Arxiv

0+阅读 · 2023年5月24日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

相关基金

基于非独立同分布样本的统计学习理论研究与应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于复合分位数回归和最大秩相关想法的ROC回归曲线估计

国家自然科学基金

0+阅读 · 2013年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

统计学习理论中的分位数回归和MEE算法

国家自然科学基金

1+阅读 · 2012年12月31日

贝叶斯离散分位数回归模型：理论，方法及应用

国家自然科学基金

0+阅读 · 2012年12月31日

递推局部多项式回归估计及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

有色噪声下基于噪声约束最小均方估计的语音增强算法

国家自然科学基金

0+阅读 · 2011年12月31日

演化和蚁群算法的近似性能分析

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员