存储动态环境中减少差异的前视值函数 (Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment) - 专知论文

会员服务 ·

0

价值函数 · 状态值函数 · 可约的 · 方差 · 泛函 ·

2021 年 8 月 5 日

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

翻译：存储动态环境中减少差异的前视值函数

Jiaming Guo,Rui Zhang,Xishan Zhang,Shaohui Peng,Qi Yi,Zidong Du,Xing Hu,Qi Guo,Yunji Chen

from arxiv, Accepted by IJCAI2021

Policy gradient methods are appealing in deep reinforcement learning but suffer from high variance of gradient estimate. To reduce the variance, the state value function is applied commonly. However, the effect of the state value function becomes limited in stochastic dynamic environments, where the unexpected state dynamics and rewards will increase the variance. In this paper, we propose to replace the state value function with a novel hindsight value function, which leverages the information from the future to reduce the variance of the gradient estimate for stochastic dynamic environments. Particularly, to obtain an ideally unbiased gradient estimate, we propose an information-theoretic approach, which optimizes the embeddings of the future to be independent of previous actions. In our experiments, we apply the proposed hindsight value function in stochastic dynamic environments, including discrete-action environments and continuous-action environments. Compared with the standard state value function, the proposed hindsight value function consistently reduces the variance, stabilizes the training, and improves the eventual policy.

翻译：政策梯度方法在深强化学习中具有吸引力,但差异很大。为了减少差异,通常会应用州值函数。但是, 州值函数的效果在随机动态环境中是有限的, 意想不到的状态动态和回报会增加差异。在本文中, 我们提议用一个新的后视值功能取代州值函数, 该功能利用未来的信息来减少对随机动态环境的梯度估计差异。特别是, 为了获得理想的公正梯度估计, 我们提议采用信息理论方法, 优化未来嵌入, 使之与以往行动无关。在我们的实验中, 我们将拟议的后视值功能应用在随机动态环境中, 包括离散行动环境和连续行动环境。与标准州值函数相比, 拟议的后视值功能会持续减少差异, 稳定培训, 并改进最终的政策。

0

相关内容

价值函数

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【ICLR2021】常识人工智能，77页ppt

【ICLR2021】常识人工智能，77页ppt

专知会员服务

78+阅读 · 2021年5月11日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

已删除

将门创投

5+阅读 · 2018年10月16日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment

Arxiv

0+阅读 · 2021年10月2日

Private Hierarchical Clustering and Efficient Approximation

Arxiv

0+阅读 · 2021年10月1日

Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines

Arxiv

0+阅读 · 2021年9月30日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

状态值函数

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【ICLR2021】常识人工智能，77页ppt

【ICLR2021】常识人工智能，77页ppt

专知会员服务

78+阅读 · 2021年5月11日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】计算受限的持续学习：基础与算法

生成式人工智能时代的多目标推荐：最新进展与未来展望综述

AI大模型技术在电力系统中的应用及发展趋势

【ICML2025】SparseLoRA：利用上下文稀疏性加速大语言模型微调

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

已删除

将门创投

5+阅读 · 2018年10月16日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment

Arxiv

0+阅读 · 2021年10月2日

Private Hierarchical Clustering and Efficient Approximation

Arxiv

0+阅读 · 2021年10月1日

Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines

Arxiv

0+阅读 · 2021年9月30日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员