MO2:基于模型的离线备选办法 (MO2: Model-Based Offline Options) - 专知论文

会员服务 ·

0

Continuity · 同策略 · 极小点 · Performer · 估计/估计量 ·

2022 年 9 月 5 日

MO2: Model-Based Offline Options

翻译：MO2:基于模型的离线备选办法

Sasha Salter,Markus Wulfmeier,Dhruva Tirumala,Nicolas Heess,Martin Riedmiller,Raia Hadsell,Dushyant Rao

from arxiv, Accepted at 1st Conference on Lifelong Learning Agents (CoLLAs) Conference Track, 2022

The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck states have been long sought after for inducing plans of minimum description length across tasks. Prior approaches have either only supported online, on-policy, bottleneck state discovery, limiting sample-efficiency, or discrete state-action domains, restricting applicability. To address this, we introduce Model-Based Offline Options (MO2), an offline hindsight framework supporting sample-efficient bottleneck option discovery over continuous state-action spaces. Once bottleneck options are learnt offline over source domains, they are transferred online to improve exploration and value estimation on the transfer domain. Our experiments show that on complex long-horizon continuous control tasks with sparse, delayed rewards, MO2's properties are essential and lead to performance exceeding recent option learning methods. Additional ablations further demonstrate the impact on option predictability and credit assignment.

翻译：从过去的经验中发现有用行为并将其转移到新任务的能力被认为是自然体现的情报的核心组成部分。在神经科学的启发下,在瓶颈国家发现改变的行为,在诱导各项任务的最低描述长度计划之后,长期以来一直寻求在瓶颈国家发现的行为。以往的做法要么只支持在线、政策上、瓶颈状态发现,限制抽样效率,或者分散的州行动领域,限制适用性。为了解决这个问题,我们引入了基于模型的离线离线后视框架(MO2),支持在连续的州行动空间中发现抽样高效的瓶颈选项。一旦在源域上从离线学习了瓶颈选项,这些选项就会被在线传输,以改善对转移域的勘探和价值估计。我们的实验显示,在复杂长方位连续控制任务上,有稀有、延迟的回报,MO2的特性至关重要,并导致业绩超过最近的选项学习方法。其他列表进一步证明了对选项可预测性和信用分配的影响。

0

相关内容

Continuity

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

交通运筹与优化专题讲习班

国家自然科学基金

3+阅读 · 2018年6月30日

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mg-Zn-RE(Ce,Nd)系镁合金强化相析出过程与强化机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

相关于算子的变指标函数空间实变理论及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

1.94 um波段Tm:Ho共掺石英基全光纤飞秒脉冲激光技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

搅拌力场辅助的粘附性粗颗粒流态化研究

国家自然科学基金

0+阅读 · 2012年12月31日

骆驼蓬碱化合物诱导松材线虫细胞凋亡的线粒体信号途径机制及其分子优化

国家自然科学基金

0+阅读 · 2011年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于化学反应飞秒相干控制的飞秒时间分辨相干Raman光谱仪的研制

国家自然科学基金

0+阅读 · 2011年12月31日

Safe Policy Improvement in Constrained Markov Decision Processes

Arxiv

0+阅读 · 2022年10月20日

IDM-Follower: A Model-Informed Deep Learning Method for Long-Sequence Car-Following Trajectory Prediction

Arxiv

0+阅读 · 2022年10月20日

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年10月19日

Robust Reinforcement Learning using Offline Data

Arxiv

0+阅读 · 2022年10月18日

Robust Optimal Designs when Missing Data Happen at Random

Arxiv

0+阅读 · 2022年10月18日

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年10月18日

Bayesian Deep Learning for Graphs

Arxiv

22+阅读 · 2022年2月24日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

85+阅读 · 2022年1月14日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

45+阅读 · 2021年1月6日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Safe Policy Improvement in Constrained Markov Decision Processes

Arxiv

0+阅读 · 2022年10月20日

IDM-Follower: A Model-Informed Deep Learning Method for Long-Sequence Car-Following Trajectory Prediction

Arxiv

0+阅读 · 2022年10月20日

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年10月19日

Robust Reinforcement Learning using Offline Data

Arxiv

0+阅读 · 2022年10月18日

Robust Optimal Designs when Missing Data Happen at Random

Arxiv

0+阅读 · 2022年10月18日

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年10月18日

Bayesian Deep Learning for Graphs

Arxiv

22+阅读 · 2022年2月24日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

85+阅读 · 2022年1月14日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

45+阅读 · 2021年1月6日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

相关基金

交通运筹与优化专题讲习班

国家自然科学基金

3+阅读 · 2018年6月30日

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mg-Zn-RE(Ce,Nd)系镁合金强化相析出过程与强化机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

相关于算子的变指标函数空间实变理论及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

1.94 um波段Tm:Ho共掺石英基全光纤飞秒脉冲激光技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

搅拌力场辅助的粘附性粗颗粒流态化研究

国家自然科学基金

0+阅读 · 2012年12月31日

骆驼蓬碱化合物诱导松材线虫细胞凋亡的线粒体信号途径机制及其分子优化

国家自然科学基金

0+阅读 · 2011年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于化学反应飞秒相干控制的飞秒时间分辨相干Raman光谱仪的研制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员