模拟学习排名游戏 (A Ranking Game for Imitation Learning) - 专知论文

会员服务 ·

0

Learning · 秩 · Agent · 逆强化学习 · state-of-the-art ·

2023 年 1 月 16 日

A Ranking Game for Imitation Learning

翻译：模拟学习排名游戏

Harshit Sikchi,Akanksha Saran,Wonjoon Goo,Scott Niekum

from arxiv, Published in Transactions on Machine Learning Research 2023. 38 pages

We propose a new framework for imitation learning -- treating imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to satisfy pairwise performance rankings between behaviors, while the policy agent learns to maximize this reward. In imitation learning, near-optimal expert data can be difficult to obtain, and even in the limit of infinite data cannot imply a total ordering over trajectories as preferences can. On the other hand, learning from preferences alone is challenging as a large number of preferences are required to infer a high-dimensional reward function, though preference data is typically much easier to collect than expert demonstrations. The classical inverse reinforcement learning (IRL) formulation learns from expert demonstrations but provides no mechanism to incorporate learning from offline preferences and vice versa. We instantiate the proposed ranking-game framework with a novel ranking loss giving an algorithm that can simultaneously learn from expert demonstrations and preferences, gaining the advantages of both modalities. Our experiments show that the proposed method achieves state-of-the-art sample efficiency and can solve previously unsolvable tasks in the Learning from Observation (LfO) setting. Project video and code can be found at https://hari-sikchi.github.io/rank-game/

翻译：我们提出了一个模仿学习的新框架 -- -- 将模仿作为政策与奖励之间的双玩排名游戏。在这个游戏中,奖赏机构学会满足行为之间的双向业绩排名,而政策机构则学会尽量扩大这种奖赏。在模仿学习中,几乎最佳的专家数据可能难以获得,甚至在无限数据的限制中,也不可能像偏好那样对轨迹进行完全排序。另一方面,仅仅从偏好中学习是一个挑战性的问题,因为需要大量偏好来推导高维度的奖赏功能,尽管偏好数据通常比专家演示更容易收集。典型的反向强化学习(IRL)的提法从专家演示中学习,但没有提供机制将离线偏好和反之学习纳入其中。我们用新的排名损失来回拨拟议的排名框架,提供一种算法,既可以从专家的演示和偏好中学习,又获得两种模式的优势。我们的实验表明,拟议的方法达到了最先进的样效率,可以比专家演示所收集的要容易得多。典型的反向强化学习的学习方法(IRLL)从专家演示中学习,但没有机制。我们从观察中找到的MA-hari/O 设置的视频代码。我们可以在观察/ROK-IK-O设置中找到的视频/Procard/Procard 。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

microRNA在缺血再灌注致急性肾损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

利用中子散射研究过掺杂Pr1-xLaCexCuO4-y的低能自旋激发

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调控HIF-1α介导颞叶癫痫耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

干旱诱导表达的苹果AsA转运蛋白功能和在抗逆中的作用分析

国家自然科学基金

0+阅读 · 2011年12月31日

生长素在harpinXoo激发的过敏性反应中的角色及其分子调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

可见光激发的单线态氧磷光探针的研究

国家自然科学基金

0+阅读 · 2009年12月31日

AlGaN基PIN太阳光盲雪崩探测器研究

国家自然科学基金

0+阅读 · 2008年12月31日

Gradient Coordination for Quantifying and Maximizing Knowledge Transference in Multi-Task Learning

Arxiv

0+阅读 · 2023年3月10日

Structured State Space Models for In-Context Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Reward Informed Dreamer for Task Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Learning swimming via deep reinforcement learning

Arxiv

0+阅读 · 2023年3月9日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

Using Memory-Based Learning to Solve Tasks with State-Action Constraints

Arxiv

0+阅读 · 2023年3月8日

Financial Time Series Representation Learning

Financial Time Series Representation Learning

Arxiv

10+阅读 · 2020年3月27日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

逆强化学习

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】基础模型后训练的新方法

欧盟防务准备路线图：目标、冲突与2030之路（附“2030年防务准备路线图”原文）

【AAAI2026】模型不确定性下的在线鲁棒规划：一种基于采样的方法

Transformers 出现以来关系抽取任务的系统综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Gradient Coordination for Quantifying and Maximizing Knowledge Transference in Multi-Task Learning

Arxiv

0+阅读 · 2023年3月10日

Structured State Space Models for In-Context Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Reward Informed Dreamer for Task Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Learning swimming via deep reinforcement learning

Arxiv

0+阅读 · 2023年3月9日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

Using Memory-Based Learning to Solve Tasks with State-Action Constraints

Arxiv

0+阅读 · 2023年3月8日

Financial Time Series Representation Learning

Financial Time Series Representation Learning

Arxiv

10+阅读 · 2020年3月27日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

microRNA在缺血再灌注致急性肾损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

利用中子散射研究过掺杂Pr1-xLaCexCuO4-y的低能自旋激发

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调控HIF-1α介导颞叶癫痫耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

干旱诱导表达的苹果AsA转运蛋白功能和在抗逆中的作用分析

国家自然科学基金

0+阅读 · 2011年12月31日

生长素在harpinXoo激发的过敏性反应中的角色及其分子调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

可见光激发的单线态氧磷光探针的研究

国家自然科学基金

0+阅读 · 2009年12月31日

AlGaN基PIN太阳光盲雪崩探测器研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员