高效价值传播最佳度等同新贪婪-Step Bellman (A Novel Greedy-Step Bellman Optimality Equation for Efficient Value Propagation) - 专知论文

会员服务 ·

0

贝尔曼最优方程 · INFORMS · 优化器 · Rainbow · Performer ·

2021 年 6 月 8 日

A Novel Greedy-Step Bellman Optimality Equation for Efficient Value Propagation

翻译：高效价值传播最佳度等同新贪婪-Step Bellman

Yuhui Wang,Qingyuan Wu,Pengcheng He,Xiaoyang Tan

Efficiently propagating credit to responsible actions is a central and challenging task in reinforcement learning. To accelerate information propagation, this paper presents a new method that bridges a highway that allows unimpeded information to flow across long horizons. The key to our method is a newly proposed Bellman equation, called Greedy-Step Bellman Optimality Equation, through which the high-credit information can fast propagate across a long horizon. We theoretically show that the solution of the new equation is exactly the optimal value function and the corresponding operator converges faster than the classical operator. Besides, it leads to a new multi-step off-policy algorithm, which is capable of safely utilizing any off-policy data collected by the arbitrary policy. Experiments reveal that the proposed method is reliable, easy to implement. Moreover, without employing additional components of Rainbow except Double DQN, our method achieves competitive performance with Rainbow on the benchmark tasks.

翻译：高效地宣传对负责任的行动的信用是强化学习中一项核心和艰巨的任务。为了加快信息传播,本文件提出了一个新的方法,连接一条能够让信息畅通无阻地跨过远界的高速公路。我们的方法的关键是新提出的贝尔曼方程式,叫做“贪婪-斯泰普·贝尔曼”最佳等式,高信用信息可以通过该方程式快速在长视野中传播。我们理论上表明,新方程式的解决方案恰恰是最佳价值功能,相应的操作员比古典操作员要快。此外,它导致一种新的多步骤的离政策算法,能够安全地利用任意政策收集的任何非政策数据。实验表明,拟议方法是可靠的,易于执行。此外,如果不使用彩虹的更多组成部分,除双QN外,我们的方法在基准任务上实现了彩虹的竞争性业绩。

0

相关内容

贝尔曼最优方程

贝尔曼最优方程

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

[ICML2021]. GRAND：图神经扩散

专知会员服务

27+阅读 · 2021年7月11日

【ICML2021】有向图网络

专知会员服务

82+阅读 · 2021年5月10日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

34+阅读 · 2020年8月14日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

AI领域顶会AAMAS2020最佳论文出炉!《深度残差强化学习》牛津大学，Deep Residual RL

AI领域顶会AAMAS2020最佳论文出炉!《深度残差强化学习》牛津大学，Deep Residual RL

专知会员服务

45+阅读 · 2020年5月15日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Negative norm estimates and superconvergence results in Galerkin method for strongly nonlinear parabolic problems

Arxiv

0+阅读 · 2021年8月3日

Immersed Boundary-Conformal Isogeometric Method for Linear Elliptic Problems

Arxiv

0+阅读 · 2021年8月2日

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Arxiv

0+阅读 · 2021年8月2日

Bounds on expected propagation time of probabilistic zero forcing

Arxiv

0+阅读 · 2021年7月31日

PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving

Arxiv

0+阅读 · 2021年7月30日

A Novel Approach to Model the Kinematics of Human Fingers Based on an Elliptic Multi-Joint Configuration

Arxiv

0+阅读 · 2021年7月30日

A Theory of Label Propagation for Subpopulation Shift

Arxiv

7+阅读 · 2021年2月22日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

贝尔曼最优方程

相关VIP内容

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

[ICML2021]. GRAND：图神经扩散

专知会员服务

27+阅读 · 2021年7月11日

【ICML2021】有向图网络

专知会员服务

82+阅读 · 2021年5月10日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

34+阅读 · 2020年8月14日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

AI领域顶会AAMAS2020最佳论文出炉!《深度残差强化学习》牛津大学，Deep Residual RL

AI领域顶会AAMAS2020最佳论文出炉!《深度残差强化学习》牛津大学，Deep Residual RL

专知会员服务

45+阅读 · 2020年5月15日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《超视距空战强化学习智能体的深度学习表征能力评估》最新70页

《第一人称视角无人机革命及其对陆战与其它战争维度的影响》最新19页报告

从兵棋推演到真实战场：人工智能指挥官在实战中的崛起

《小型无人机系统空域管理与控制：美陆军指挥官手册》最新34页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Negative norm estimates and superconvergence results in Galerkin method for strongly nonlinear parabolic problems

Arxiv

0+阅读 · 2021年8月3日

Immersed Boundary-Conformal Isogeometric Method for Linear Elliptic Problems

Arxiv

0+阅读 · 2021年8月2日

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Arxiv

0+阅读 · 2021年8月2日

Bounds on expected propagation time of probabilistic zero forcing

Arxiv

0+阅读 · 2021年7月31日

PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving

Arxiv

0+阅读 · 2021年7月30日

A Novel Approach to Model the Kinematics of Human Fingers Based on an Elliptic Multi-Joint Configuration

Arxiv

0+阅读 · 2021年7月30日

A Theory of Label Propagation for Subpopulation Shift

Arxiv

7+阅读 · 2021年2月22日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员