从单一示威中发现,通过常规学习加强政策学习 (Augmenting Policy Learning with Routines Discovered from a Single Demonstration) - 专知论文

会员服务 ·

0

学成 · Extensibility · 可辨认的 · state-of-the-art · Atari ·

2021 年 5 月 2 日

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

翻译：从单一示威中发现,通过常规学习加强政策学习

Zelin Zhao,Chuang Gan,Jiajun Wu,Xiaoxiao Guo,Joshua B. Tenenbaum

from arxiv, AAAI 2021. Code is available at https://github.com/sjtuytc/AAAI21-RoutineAugmentedPolicyLearning

Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C. Further, we show that discovered routines can generalize to unseen levels and difficulties on the CoinRun benchmark.

翻译：人类可以从非常少的数据中总结先前的知识,并将其用于促进技能学习。在本文中,我们建议进行常规强化的政策学习(RAPL),通过单一演示发现由原始行动构成的例行活动,并利用发现的例行活动加强政策学习。为了从演示中发现例行活动,我们首先通过辨别示范活动轨迹的语法来抽象的例行活动候选人。然后,选择以长度和频率衡量的最佳例行活动来形成一个例行图书馆。我们建议同时在原始一级和日常一级学习政策,同时发现日常活动,利用日常活动的时间结构。我们的方法使得能够在多个时间尺度上模仿专家行为,以进行模仿学习,并促进强化学习探索。对Atari游戏的广泛实验表明,RAPL改进了最先进的模拟学习方法SQIL和强化学习方法A2C。此外,我们证明发现的日常活动可以概括到看不见的水平和CoinRun基准上的困难。

0

相关内容

多Agent深度强化学习综述(中文版)，21页pdf

专知会员服务

100+阅读 · 2021年1月1日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

90+阅读 · 2020年7月4日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

35+阅读 · 2020年1月23日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

45+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

58+阅读 · 2019年8月26日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

已删除

将门创投

9+阅读 · 2017年7月28日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Learning from Demonstration without Demonstrations

Arxiv

0+阅读 · 2021年6月17日

Automatic Curricula via Expert Demonstrations

Arxiv

0+阅读 · 2021年6月16日

Guided Exploration with Proximal Policy Optimization using a Single Demonstration

Arxiv

0+阅读 · 2021年6月16日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

10+阅读 · 2021年4月16日

Learning Memory-guided Normality for Anomaly Detection

Learning Memory-guided Normality for Anomaly Detection

Arxiv

4+阅读 · 2020年3月30日

Hierarchical Meta Learning

Arxiv

9+阅读 · 2019年4月19日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning

Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning

Arxiv

3+阅读 · 2018年9月4日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

多Agent深度强化学习综述(中文版)，21页pdf

专知会员服务

100+阅读 · 2021年1月1日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

90+阅读 · 2020年7月4日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

35+阅读 · 2020年1月23日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

45+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

58+阅读 · 2019年8月26日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

已删除

将门创投

9+阅读 · 2017年7月28日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Learning from Demonstration without Demonstrations

Arxiv

0+阅读 · 2021年6月17日

Automatic Curricula via Expert Demonstrations

Arxiv

0+阅读 · 2021年6月16日

Guided Exploration with Proximal Policy Optimization using a Single Demonstration

Arxiv

0+阅读 · 2021年6月16日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

10+阅读 · 2021年4月16日

Learning Memory-guided Normality for Anomaly Detection

Learning Memory-guided Normality for Anomaly Detection

Arxiv

4+阅读 · 2020年3月30日

Hierarchical Meta Learning

Arxiv

9+阅读 · 2019年4月19日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning

Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning

Arxiv

3+阅读 · 2018年9月4日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

微信扫码咨询专知VIP会员