在 Sparse- Reward 环境中的自上上下自上下在线奖赏形状 (Self-Supervised Online Reward Shaping in Sparse-Reward Environments) - 专知论文

会员服务 ·

0

奖励函数 · 回合 · 泛函 · 塑造 · 稀疏 ·

2021 年 7 月 26 日

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

翻译：在 Sparse- Reward 环境中的自上上下自上下在线奖赏形状

Farzan Memarian,Wonjoon Goo,Rudolf Lioutikov,Scott Niekum,Ufuk Topcu

from arxiv, Accepted for publication in IROS 2021

We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards. The proposed framework alternates between classification-based reward inference and policy update steps -- the original sparse reward provides a self-supervisory signal for reward inference by ranking trajectories that the agent observes, while the policy update is performed with the newly inferred, typically dense reward function. We introduce theory that shows that, under certain conditions, this alteration of the reward function will not change the optimal policy of the original MDP, while potentially increasing learning speed significantly. Experimental results on several sparse-reward environments demonstrate that, across multiple domains, the proposed algorithm is not only significantly more sample efficient than a standard RL baseline using sparse rewards, but, at times, also achieves similar sample efficiency compared to when hand-designed dense reward functions are used.

翻译：我们引入了自我监督的在线奖赏形状(SORS),其目的是通过自动压缩奖励,提高稀有奖励环境中任何RL算法的样本效率。拟议框架在基于分类的奖赏推断和政策更新步骤之间互换 -- -- 原始的稀有奖赏提供了一种自我监督信号,以奖励代理人所观察到的轨迹的推论,而政策更新则与新推断的典型的密集奖赏功能一起进行。我们引入的理论表明,在某些条件下,这种奖赏功能的改变不会改变原始的MDP的最佳政策,同时可能大幅提高学习速度。几个稀有奖赏环境的实验结果表明,在多个领域,提议的算法不仅比标准RL基线使用微量奖赏的效率要高得多,而且有时还取得了与使用手工设计的密集奖函数时相似的样本效率。

0

相关内容

奖励函数

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

42+阅读 · 2021年1月11日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

专知会员服务

44+阅读 · 2019年12月24日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Arxiv

0+阅读 · 2021年9月24日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Learning to Update for Object Tracking

Arxiv

8+阅读 · 2018年6月19日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

42+阅读 · 2021年1月11日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

专知会员服务

44+阅读 · 2019年12月24日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Arxiv

0+阅读 · 2021年9月24日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Learning to Update for Object Tracking

Arxiv

8+阅读 · 2018年6月19日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员