基于技能的基于技能的基于模式的强化学习 (Skill-based Model-based Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · MoDELS · 强化学习 · 长期规划 · HTTPS ·

2022 年 7 月 15 日

Skill-based Model-based Reinforcement Learning

翻译：基于技能的基于技能的基于模式的强化学习

Lucy Xiaoyang Shi,Joseph J. Lim,Youngwoon Lee

from arxiv, Website: \url{https://clvrai.com/skimo}

Model-based reinforcement learning (RL) is a sample-efficient way of learning complex behaviors by leveraging a learned single-step dynamics model to plan actions in imagination. However, planning every action for long-horizon tasks is not practical, akin to a human planning out every muscle movement. Instead, humans efficiently plan with high-level skills to solve complex tasks. From this intuition, we propose a Skill-based Model-based RL framework (SkiMo) that enables planning in the skill space using a skill dynamics model, which directly predicts the skill outcomes, rather than predicting all small details in the intermediate states, step by step. For accurate and efficient long-term planning, we jointly learn the skill dynamics model and a skill repertoire from prior experience. We then harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space, which enables efficient downstream learning of long-horizon, sparse reward tasks. Experimental results in navigation and manipulation domains show that SkiMo extends the temporal horizon of model-based approaches and improves the sample efficiency for both model-based RL and skill-based RL. Code and videos are available at \url{https://clvrai.com/skimo}

翻译：以模型为基础的强化学习(RL)是一种抽样有效的方法,通过利用一个学习的单步动态模型来规划想象中的行动,来学习复杂的复杂行为。然而,规划每次长视线任务的行动并不实用,而类似于人类规划每次肌肉运动。相反,人类以高技能高效计划解决复杂任务。我们从这一直觉中建议了一个基于技能的模型RL框架(Skimo),它能够利用一种技能动态模型在技能空间进行规划,这种模型直接预测技能成果,而不是一步一步预测中间各州的所有小细节。为了进行准确有效的长期规划,我们共同学习技能动态模型,并从以往的经验中重新学习技能。我们然后利用学习的技能动态模型来准确模拟和规划技能空间的长视线,从而能够高效率地在下游学习长视线、稀薄的奖励任务。导航和操纵领域的实验结果显示,SkiMo扩大了基于模型的方法的时间范围,提高了基于模型的方法和基于技能的Rlaiur/Rlagr{Col}的样本效率。

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

井中雷达储层监测机理及方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大范围户外非结构化环境下长时间视觉SLAM关键问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

森林资源遥感监测波段窗口研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有多级孔结构TiO2/C块体材料的构筑及其可见光催化净化室内空气的研究

国家自然科学基金

0+阅读 · 2012年12月31日

癌症的靶向基因 - 痘苗溶瘤病毒治疗策略

国家自然科学基金

1+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

脱－γ－羧基凝血酶原(Des-γ-carboxyl prothrombin DCP)促进肝癌恶性增殖与转移作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

PGRMC1蛋白在肾癌中的功能及作用机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

地面激光雷达提取森林单木结构参数研究

国家自然科学基金

0+阅读 · 2009年12月31日

Reward Delay Attacks on Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年9月8日

K-VIL: Keypoints-based Visual Imitation Learning

Arxiv

0+阅读 · 2022年9月7日

Concept-modulated model-based offline reinforcement learning for rapid generalization

Arxiv

0+阅读 · 2022年9月7日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

91+阅读 · 2022年1月14日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Reward Delay Attacks on Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年9月8日

K-VIL: Keypoints-based Visual Imitation Learning

Arxiv

0+阅读 · 2022年9月7日

Concept-modulated model-based offline reinforcement learning for rapid generalization

Arxiv

0+阅读 · 2022年9月7日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

91+阅读 · 2022年1月14日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

相关基金

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

井中雷达储层监测机理及方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大范围户外非结构化环境下长时间视觉SLAM关键问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

森林资源遥感监测波段窗口研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有多级孔结构TiO2/C块体材料的构筑及其可见光催化净化室内空气的研究

国家自然科学基金

0+阅读 · 2012年12月31日

癌症的靶向基因 - 痘苗溶瘤病毒治疗策略

国家自然科学基金

1+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

脱－γ－羧基凝血酶原(Des-γ-carboxyl prothrombin DCP)促进肝癌恶性增殖与转移作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

PGRMC1蛋白在肾癌中的功能及作用机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

地面激光雷达提取森林单木结构参数研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员