深强化学习中自我监督探索的变异动态 (Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning) - 专知论文

会员服务 ·

0

回合 · 多峰值 · 潜变量/隐变量 · 学成 · 深度强化学习 ·

2020 年 10 月 17 日

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

翻译：深强化学习中自我监督探索的变异动态

Chenjia Bai,Peng Liu,Zhaoran Wang,Kaiyu Liu,Lingxiao Wang,Yingnan Zhao

from arxiv, associated videos at https://sites.google.com/view/exploration-vdm

Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded. Significant advances based on intrinsic motivation show promising results in simple environments but often get stuck in environments with multimodal and stochastic dynamics. In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity. We consider the environmental state-action transition as a conditional generative process by generating the next-state prediction under the condition of the current state, action, and latent variable. We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration, which allows the agent to learn skills by self-supervised exploration without observing extrinsic rewards. We evaluate the proposed method on several image-based simulation tasks and a real robotic manipulating task. Our method outperforms several state-of-the-art environment model-based exploration approaches.

翻译：在强化学习方面,特别是在环境的外部回报很少甚至完全被忽略的任务方面,有效的探索仍然是一个具有挑战性的问题。基于内在动机的重大进步在简单环境中显示了有希望的结果,但往往被困在多式和随机动态的环境中。在这项工作中,我们提议了一个基于模拟多式联运和随机性的有条件的变异推论的变异动态模型。我们认为环境状态行动过渡是一种有条件的基因化过程,在目前状态、行动和潜在变异的条件下生成了下一个国家的预测。我们从环境转型的负日志相似性中获取了一个上限,并使用了诸如内在勘探奖赏这样的上限,使代理人能够在不观察外部奖励的情况下通过自我监督的勘探学习技能。我们评估了若干基于图像的模拟任务和真正的机器人操纵任务的拟议方法。我们的方法超越了若干基于环境模型的状态式勘探方法。

0

相关内容

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【推荐论文】具有深度学习知识的贝叶斯推理，Bayesian Reasoning with Deep-Learned Knowledge

【推荐论文】具有深度学习知识的贝叶斯推理，Bayesian Reasoning with Deep-Learned Knowledge

专知会员服务

39+阅读 · 2020年2月2日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Arxiv

1+阅读 · 2020年12月7日

Reset-Free Lifelong Learning with Skill-Space Planning

Arxiv

0+阅读 · 2020年12月7日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Relational Deep Reinforcement Learning

Relational Deep Reinforcement Learning

Arxiv

10+阅读 · 2018年6月28日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

潜变量/隐变量

深度强化学习

相关VIP内容

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【推荐论文】具有深度学习知识的贝叶斯推理，Bayesian Reasoning with Deep-Learned Knowledge

【推荐论文】具有深度学习知识的贝叶斯推理，Bayesian Reasoning with Deep-Learned Knowledge

专知会员服务

39+阅读 · 2020年2月2日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Arxiv

1+阅读 · 2020年12月7日

Reset-Free Lifelong Learning with Skill-Space Planning

Arxiv

0+阅读 · 2020年12月7日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Relational Deep Reinforcement Learning

Relational Deep Reinforcement Learning

Arxiv

10+阅读 · 2018年6月28日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员