模拟离线模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟学习 (Discriminator-Guided Model-Based Offline Imitation Learning) - 专知论文

会员服务 ·

0

Learning · Performer · MoDELS · 稳健性 · 判别器 ·

2023 年 1 月 10 日

Discriminator-Guided Model-Based Offline Imitation Learning

翻译：模拟离线模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟学习

Wenjia Zhang,Haoran Xu,Haoyi Niu,Peng Cheng,Ming Li,Heming Zhang,Guyue Zhou,Xianyuan Zhan

from arxiv, This work has been accepted by CoRL 2022

Offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. Existing offline IL methods suffer from severe performance degeneration under limited expert data. Including a learned dynamics model can potentially improve the state-action space coverage of expert data, however, it also faces challenging issues like model approximation/generalization errors and suboptimality of rollout data. In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations. DMIL adopts a novel cooperative-yet-adversarial learning strategy, which uses the discriminator to guide and couple the learning process of the policy and dynamics model, resulting in improved model performance and robustness. Our framework can also be extended to the case when demonstrations contain a large proportion of suboptimal data. Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.

翻译：离线模拟学习(IL)是解决专家无报酬标签演示产生的决策问题的有力方法; 现有的离线模拟方法在有限的专家数据下出现严重性能退化; 包括一个学习的动态模型有可能改善专家数据的州-行动空间覆盖范围,然而,它也面临一些具有挑战性的问题,如模型近似/概括性错误和推出数据的不理想性。在本文件中,我们提议了以差异为指南的基于模型的离线脱线学习(DMIL)框架,该框架引入了一种区分器,以同时区分模型推出数据的动态正确性和亚优度与实际专家演示相比。 DMIL采用一种新的合作式对立式学习战略,利用歧视器指导并结合政策和动态模型的学习过程,从而改进模型性能和稳健性。我们的框架还可以扩大到演示包含大量亚优性数据的情况。实验结果显示,DMIL及其扩展与小型数据设置的状态离线下IL方法相比,取得了优异性能和稳健性。

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

最优控制问题H1-Galerkin混合有限元方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于正交级数展开的多体系统混合不确定性研究

国家自然科学基金

1+阅读 · 2015年12月31日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

激光粒子加速及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

导轨式电磁发射装置用新型混合储能技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

黄瓜ERF转录因子CsERF1调控耐涝性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

GNSS/INS深组合系统中载波跟踪性能与IMU误差之间映射关系的理论与方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

最优控制问题自适应混合有限元方法

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Reward Functions for Robotic Manipulation by Observing Humans

Arxiv

0+阅读 · 2023年3月7日

Taming Self-Supervised Learning for Presentation Attack Detection: In-Image De-Folding and Out-of-Image De-Mixing

Arxiv

0+阅读 · 2023年3月7日

Filter Pruning based on Information Capacity and Independence

Arxiv

0+阅读 · 2023年3月7日

Guided Image-to-Image Translation by Discriminator-Generator Communication

Arxiv

0+阅读 · 2023年3月7日

Time series anomaly detection with sequence reconstruction based state-space model

Arxiv

0+阅读 · 2023年3月6日

Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

Arxiv

0+阅读 · 2023年3月6日

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Arxiv

0+阅读 · 2023年3月3日

Automated Task-Time Interventions to Improve Teamwork using Imitation Learning

Arxiv

0+阅读 · 2023年3月2日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

相关论文

Learning Reward Functions for Robotic Manipulation by Observing Humans

Arxiv

0+阅读 · 2023年3月7日

Taming Self-Supervised Learning for Presentation Attack Detection: In-Image De-Folding and Out-of-Image De-Mixing

Arxiv

0+阅读 · 2023年3月7日

Filter Pruning based on Information Capacity and Independence

Arxiv

0+阅读 · 2023年3月7日

Guided Image-to-Image Translation by Discriminator-Generator Communication

Arxiv

0+阅读 · 2023年3月7日

Time series anomaly detection with sequence reconstruction based state-space model

Arxiv

0+阅读 · 2023年3月6日

Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

Arxiv

0+阅读 · 2023年3月6日

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Arxiv

0+阅读 · 2023年3月3日

Automated Task-Time Interventions to Improve Teamwork using Imitation Learning

Arxiv

0+阅读 · 2023年3月2日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

最优控制问题H1-Galerkin混合有限元方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于正交级数展开的多体系统混合不确定性研究

国家自然科学基金

1+阅读 · 2015年12月31日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

激光粒子加速及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

导轨式电磁发射装置用新型混合储能技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

黄瓜ERF转录因子CsERF1调控耐涝性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

GNSS/INS深组合系统中载波跟踪性能与IMU误差之间映射关系的理论与方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

最优控制问题自适应混合有限元方法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员