Truncating Trajectories in Monte Carlo Reinforcement Learning - 专知论文

会员服务 ·

0

蒙特卡罗 · 期望回报 · Learning · Agent · 总回报 ·

2023 年 5 月 7 日

Truncating Trajectories in Monte Carlo Reinforcement Learning

翻译：暂无翻译

Riccardo Poiani,Alberto Maria Metelli,Marcello Restelli

In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i.e., the expected return. In practice, in many tasks of interest, such as policy optimization, the agent usually spends its interaction budget by collecting episodes of fixed length within a simulator (i.e., Monte Carlo simulation). However, given the discounted nature of the RL objective, this data collection strategy might not be the best option. Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths, i.e., truncated. The proposed approach provably minimizes the width of the confidence intervals around the empirical estimates of the expected return of a policy. After discussing the theoretical properties of our method, we make use of our trajectory truncation mechanism to extend Policy Optimization via Importance Sampling (POIS, Metelli et al., 2018) algorithm. Finally, we conduct a numerical comparison between our algorithm and POIS: the results are consistent with our theory and show that an appropriate truncation of the trajectories can succeed in improving performance.

翻译：暂无翻译

0

相关内容

蒙特卡罗

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

间接优化的高效Monte Carlo声传播研究

国家自然科学基金

0+阅读 · 2017年12月31日

三维结构表面上石墨烯的可控制备方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于β/B2相自润滑的TiAl-RE合金优化设计与管材制备基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

刚柔嵌段共聚物自组装行为的快速非格子Monte Carlo模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

ICF中高能电子和离子输运的Monte-Carlo算法研究和程序研制

国家自然科学基金

0+阅读 · 2009年12月31日

Value Gradient weighted Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月20日

Bootstrapped Representations in Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Fairness in Preference-based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

VIP会员

文章信息

相关主题

相关VIP内容

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

因果强化学习的统一框架：综述、分类体系、算法与应用

《无人机系统 - 反无人机系统：测试方法》364页

【MIT博士论文】语言模型的推理时学习算法

美军低成本无人作战攻击系统（LUCAS）：扩大无人机战争规模

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Value Gradient weighted Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月20日

Bootstrapped Representations in Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Fairness in Preference-based Reinforcement Learning

Arxiv

0+阅读 · 2023年6月16日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

相关基金

间接优化的高效Monte Carlo声传播研究

国家自然科学基金

0+阅读 · 2017年12月31日

三维结构表面上石墨烯的可控制备方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于β/B2相自润滑的TiAl-RE合金优化设计与管材制备基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

刚柔嵌段共聚物自组装行为的快速非格子Monte Carlo模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

ICF中高能电子和离子输运的Monte-Carlo算法研究和程序研制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员