代表制学习和奖赏形成奖赏预测的评分预测 (Reward prediction for representation learning and reward shaping) - 专知论文

会员服务 ·

0

预测器/决策函数 · 学成 · 回合 · 塑造 · 表示学习 ·

2021 年 5 月 7 日

Reward prediction for representation learning and reward shaping

翻译：代表制学习和奖赏形成奖赏预测的评分预测

Hlynur Davíð Hlynsson,Laurenz Wiskott

One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in environment with a single, terminating, goal state. We augment the training of out-of-the-box RL agents by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to significantly enhance Actor Critic using Kronecker-factored Trust Region and Proximal Policy Optimization in single-goal environments with visual inputs.

翻译：强化学习(RL)的根本挑战之一是数据效率:现代算法需要大量培训样本,特别是相对于人类,以解决高维观测的环境。当奖赏信号稀少时,这一问题就更加严重。在这项工作中,我们提议以自我监督的方式学习国家代表制,以进行奖赏预测。奖赏预测员学会用单一的、终止的、目标状态来估计环境中真实奖赏信号的原始或顺利版本。我们在政策学习期间利用我们的奖赏预测来塑造奖赏,从而扩大对箱外RL代理的培训。利用我们的奖赏预测来预处理高维观,以及利用预测器来塑造奖赏,显示利用Kroncecker-crent Trust区域和有视觉投入的单一目标环境中的普罗克西马政策优化大大加强了Actor Critic。

0

相关内容

预测器/决策函数

预测器/决策函数

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

38+阅读 · 2020年11月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

241+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

157+阅读 · 2020年2月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

82+阅读 · 2020年2月18日

【WSDM2020 Tutorial】图学习与推理的推荐系统，130页ppt，Learning and Reasoning on Graph for Recommendation，新加坡国立大学

【WSDM2020 Tutorial】图学习与推理的推荐系统，130页ppt，Learning and Reasoning on Graph for Recommendation，新加坡国立大学

专知会员服务

96+阅读 · 2020年2月7日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

52+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

【Pieter Abbeel 报告@CMU】元学习与深度强化学习机器人应用，Deep Learning to Learn，84页ppt

【Pieter Abbeel 报告@CMU】元学习与深度强化学习机器人应用，Deep Learning to Learn，84页ppt

专知会员服务

31+阅读 · 2019年10月12日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Graph Learning: A Survey

Arxiv

56+阅读 · 2021年5月3日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Deep Learning for Learning Graph Representations

Arxiv

35+阅读 · 2020年1月2日

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Arxiv

18+阅读 · 2019年12月25日

Inferred successor maps for better transfer learning

Inferred successor maps for better transfer learning

Arxiv

3+阅读 · 2019年7月2日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Large-Scale Study of Curiosity-Driven Learning

Large-Scale Study of Curiosity-Driven Learning

Arxiv

8+阅读 · 2018年8月13日

Deep Learning

Arxiv

6+阅读 · 2018年8月3日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

预测器/决策函数

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

38+阅读 · 2020年11月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

241+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

157+阅读 · 2020年2月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

82+阅读 · 2020年2月18日

【WSDM2020 Tutorial】图学习与推理的推荐系统，130页ppt，Learning and Reasoning on Graph for Recommendation，新加坡国立大学

【WSDM2020 Tutorial】图学习与推理的推荐系统，130页ppt，Learning and Reasoning on Graph for Recommendation，新加坡国立大学

专知会员服务

96+阅读 · 2020年2月7日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

52+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

【Pieter Abbeel 报告@CMU】元学习与深度强化学习机器人应用，Deep Learning to Learn，84页ppt

【Pieter Abbeel 报告@CMU】元学习与深度强化学习机器人应用，Deep Learning to Learn，84页ppt

专知会员服务

31+阅读 · 2019年10月12日

热门VIP内容

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Graph Learning: A Survey

Arxiv

56+阅读 · 2021年5月3日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Deep Learning for Learning Graph Representations

Arxiv

35+阅读 · 2020年1月2日

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Arxiv

18+阅读 · 2019年12月25日

Inferred successor maps for better transfer learning

Inferred successor maps for better transfer learning

Arxiv

3+阅读 · 2019年7月2日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Large-Scale Study of Curiosity-Driven Learning

Large-Scale Study of Curiosity-Driven Learning

Arxiv

8+阅读 · 2018年8月13日

Deep Learning

Arxiv

6+阅读 · 2018年8月3日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员