一种几乎接近黑色最佳的政策梯度方法 (A nearly Blackwell-optimal policy gradient method) - 专知论文

会员服务 ·

0

优化器 · 衰减系数 · 有偏 · Continuity · Performer ·

2021 年 5 月 28 日

A nearly Blackwell-optimal policy gradient method

翻译：一种几乎接近黑色最佳的政策梯度方法

Vektor Dewanto,Marcus Gallagher

from arxiv, 26 pages (9-page main content)

For continuing environments, reinforcement learning methods commonly maximize a discounted reward criterion with discount factor close to 1 in order to approximate the steady-state reward (the gain). However, such a criterion only considers the long-run performance, ignoring the transient behaviour. In this work, we develop a policy gradient method that optimizes the gain, then the bias (which indicates the transient performance and is important to capably select from policies with equal gain). We derive expressions that enable sampling for the gradient of the bias, and its preconditioning Fisher matrix. We further propose an algorithm that solves the corresponding bi-level optimization using a logarithmic barrier. Experimental results provide insights into the fundamental mechanisms of our proposal.

翻译：对于持续环境,强化学习方法通常会尽量扩大贴现奖励标准,贴现系数接近1,以接近稳定状态奖励(收益),然而,这种标准只考虑长期业绩,而忽略短暂行为。在这项工作中,我们开发了一种政策梯度方法,优化收益,然后是偏差(表明短暂业绩,对以同等收益从政策中可任意选择很重要),我们从中得出一些表达方式,能够抽样选择偏差梯度,以及作为渔业矩阵的先决条件。我们进一步提出一种算法,用对数屏障解决相应的双层优化。实验结果为我们提案的基本机制提供了深刻的见解。

0

相关内容

优化器

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

5+阅读 · 2019年4月29日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

Arxiv

0+阅读 · 2021年7月20日

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Arxiv

0+阅读 · 2021年7月19日

Distributed Planning for Serving Cooperative Tasks with Time Windows: A Game Theoretic Approach

Arxiv

0+阅读 · 2021年7月18日

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Arxiv

0+阅读 · 2021年7月18日

Refined Policy Improvement Bounds for MDPs

Arxiv

0+阅读 · 2021年7月16日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Optimal Transport for Multi-source Domain Adaptation under Target Shift

Arxiv

7+阅读 · 2018年3月13日

VIP会员

文章信息

相关主题

相关VIP内容

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

5+阅读 · 2019年4月29日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

Arxiv

0+阅读 · 2021年7月20日

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Arxiv

0+阅读 · 2021年7月19日

Distributed Planning for Serving Cooperative Tasks with Time Windows: A Game Theoretic Approach

Arxiv

0+阅读 · 2021年7月18日

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Arxiv

0+阅读 · 2021年7月18日

Refined Policy Improvement Bounds for MDPs

Arxiv

0+阅读 · 2021年7月16日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Optimal Transport for Multi-source Domain Adaptation under Target Shift

Arxiv

7+阅读 · 2018年3月13日

微信扫码咨询专知VIP会员