在机器人操纵的视觉控制政策中,将动力规划者政策提升为视觉控制政策 (Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation) - 专知论文

会员服务 ·

0

蒸馏 · 回合 · 学成 · 控制器 · Guidance ·

2021 年 11 月 11 日

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

翻译：在机器人操纵的视觉控制政策中,将动力规划者政策提升为视觉控制政策

I-Chun Arthur Liu,Shagun Uppal,Gaurav S. Sukhatme,Joseph J. Lim,Peter Englert,Youngwoon Lee

from arxiv, Published at the Conference on Robot Learning (CoRL) 2021

Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill a state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We evaluate our method on three manipulation tasks in obstructed environments and compare it against various reinforcement learning and imitation learning baselines. The results demonstrate that our framework is highly sample-efficient and outperforms the state-of-the-art algorithms. Moreover, coupled with domain randomization, our policy is capable of zero-shot transfer to unseen environment settings with distractors. Code and videos are available at https://clvrai.com/mopa-pd

翻译：在现实、困难的环境中进行复杂的学习操作任务是一个具有挑战性的问题,原因是在存在障碍和高维视觉观测的情况下进行了艰苦的探索; 先前的工作通过整合运动规划和强化学习来解决勘探问题; 然而,动议规划者强化政策要求获取国家信息,而在现实世界环境中往往无法获得这些信息; 为此,我们提议通过下列方式将基于国家的运动规划者提升为视觉控制政策:(1) 视觉行为性克隆,以消除运动规划者的依赖性,同时消除其飞速运动;(2) 视觉强化学习,在行为性克隆剂平稳轨迹的指引下进行。我们评估了在障碍环境中的三种操纵任务的方法,并将其与各种强化学习和模仿学习基线进行比较。结果表明,我们的框架具有很高的样本效率,并超越了艺术状态的算法。此外,除了域随机化外,我们的政策能够零发式地转移到转移器的看不见的环境环境中。代码和视频可在https://clvrai.com/mopa-pd上查阅。

0

相关内容

【NeurIPS2021】视觉语言导航的课程学习

【NeurIPS2021】视觉语言导航的课程学习

专知会员服务

24+阅读 · 2021年11月26日

【NeurIPS2021】任务导向的无监督域自适应

【NeurIPS2021】任务导向的无监督域自适应

专知会员服务

17+阅读 · 2021年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Competence-Aware Path Planning via Introspective Perception

Arxiv

0+阅读 · 2022年1月14日

Multi-agent Motion Planning from Signal Temporal Logic Specifications

Arxiv

0+阅读 · 2022年1月13日

Motion Planning in Dynamic Environments Using Context-Aware Human Trajectory Prediction

Motion Planning in Dynamic Environments Using Context-Aware Human Trajectory Prediction

Arxiv

0+阅读 · 2022年1月13日

Configuration Space Decomposition for Scalable Proxy Collision Checking in Robot Planning and Control

Arxiv

0+阅读 · 2022年1月13日

Custom Distribution for Sampling-Based Motion Planning

Arxiv

0+阅读 · 2022年1月13日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Learning Discriminative Motion Features Through Detection

Learning Discriminative Motion Features Through Detection

Arxiv

3+阅读 · 2018年12月11日

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Arxiv

6+阅读 · 2018年4月23日

Depth-Adaptive Computational Policies for Efficient Visual Tracking

Arxiv

8+阅读 · 2018年1月1日

VIP会员

文章信息

相关主题

相关VIP内容

【NeurIPS2021】视觉语言导航的课程学习

【NeurIPS2021】视觉语言导航的课程学习

专知会员服务

24+阅读 · 2021年11月26日

【NeurIPS2021】任务导向的无监督域自适应

【NeurIPS2021】任务导向的无监督域自适应

专知会员服务

17+阅读 · 2021年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复合人工智能决策优势：面向军事行动的人类数字孪生智能体编队与群体建模》最新文献

中文版《整合蓝绿作战域：北约空陆一体化向多域作战演进》2025最新资料

演进中的空中力量指挥控制体系

《在轨空间目标多智能体检测的制导、导航与控制》195页

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Competence-Aware Path Planning via Introspective Perception

Arxiv

0+阅读 · 2022年1月14日

Multi-agent Motion Planning from Signal Temporal Logic Specifications

Arxiv

0+阅读 · 2022年1月13日

Motion Planning in Dynamic Environments Using Context-Aware Human Trajectory Prediction

Motion Planning in Dynamic Environments Using Context-Aware Human Trajectory Prediction

Arxiv

0+阅读 · 2022年1月13日

Configuration Space Decomposition for Scalable Proxy Collision Checking in Robot Planning and Control

Arxiv

0+阅读 · 2022年1月13日

Custom Distribution for Sampling-Based Motion Planning

Arxiv

0+阅读 · 2022年1月13日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Learning Discriminative Motion Features Through Detection

Learning Discriminative Motion Features Through Detection

Arxiv

3+阅读 · 2018年12月11日

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Arxiv

6+阅读 · 2018年4月23日

Depth-Adaptive Computational Policies for Efficient Visual Tracking

Arxiv

8+阅读 · 2018年1月1日

微信扫码咨询专知VIP会员