离线强化学习的保守学习 (Contextual Conservative Q-Learning for Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 强化学习 · 过估计 · MoDELS · INFORMS ·

2023 年 1 月 16 日

Contextual Conservative Q-Learning for Offline Reinforcement Learning

翻译：离线强化学习的保守学习

Ke Jiang,Jiayu Yao,Xiaoyang Tan

from arxiv, the work is not finished

Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by distribution shift will still lead to the overestimation for those actions that transit to out-of-distribution(OOD) states, which degrades the reliability and robustness of the offline policy. In this paper, we propose Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy through the contextual information captured via an inverse dynamics model. With the supervision of the inverse dynamics model, it tends to learn a policy that generates stable transition at perturbed states, for the fact that pertuebed states are a common kind of OOD states. In this manner, we enable the learnt policy more likely to generate transition that destines to the empirical next state distributions of the offline dataset, i.e., robustly reliable transition. Besides, we theoretically reveal that C-CQL is the generalization of the Conservative Q-Learning(CQL) and aggressive State Deviation Correction(SDC). Finally, experimental results demonstrate the proposed C-CQL achieves the state-of-the-art performance in most environments of offline Mujoco suite and a noisy Mujoco setting.

翻译：离线强化学习在不在线互动的情况下学习关于离线数据集的有效政策,并因其实际应用潜力而吸引持续的研究关注。然而,分配转移产生的外推错误仍将导致高估那些转至离线(OOD)状态的行动,从而降低离线政策的可靠性和稳健性。在本文中,我们提议通过通过反向动态模型获取的背景信息来学习强有力的可靠政策。在反向动态模型的监督下,它倾向于学习一种在偏向状态产生稳定过渡的政策,因为处于边缘状态的国家是OOOD状态的常见类型。通过这种方式,我们使得所学的政策更有可能产生向离线数据集(即,稳健可靠的过渡)下一个经验性状态分布的转变。此外,我们理论上表明C-CQL(C-QIL)是保守性Q-LINTED(C QQL) 的概括化, 以及侵略性国的实验性-C-C-C-C-C-C-C-C-C-L-SAL-S-SAL-SAL-SAL-SAL-SAL-SAL-ATINSL-SAL-SAL-SAL-SAL-SAL-SAL-SAL-ATINS-ATINS-AD-AD-SL-SL-AD-ATINSL-S-S-S-S-S-N-N-S-S-S-S-S-S-AD-L-S-S-S-N-N-MATINSL-S-S-L-L-L-I-L-L-L-L-L-L-S-I-N-MATINSTITITINS-ATITITINS-S-S-N-N-TINS-N-N-N-N-N-AD-N-N-N-N-S-I-S-N-N-N-MATIal-MAD-N-S-S-S-S-N-N-N-N-AD-AD-N-AD-S-S-S-S-S-S-AD-N-N-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA

0

相关内容

Learning

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

基于Budyko假说及GRACE重力卫星观测对流域水量平衡变化的多时间尺度研究

国家自然科学基金

0+阅读 · 2017年12月31日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

面向BYOD数据防护机制的多维脆弱性攻击研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于糖化合物“Ferrier Carbocyclization”汞离子荧光探针的设计、合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

高动态室内无线环境中渐进式自适应定位方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带放大器用Er3+/Ce3+共掺碲酸盐玻璃及光纤1.53μm波段辐射强度提高研究

国家自然科学基金

0+阅读 · 2011年12月31日

环境友好型磷酸盐胶凝材料机理与性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年3月9日

Cherry-Picking with Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Arxiv

0+阅读 · 2023年3月9日

Intent-based Deep Reinforcement Learning for Multi-agent Informative Path Planning

Arxiv

0+阅读 · 2023年3月9日

A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Learning Enhancement From Degradation: A Diffusion Model For Fundus Image Enhancement

Arxiv

0+阅读 · 2023年3月8日

On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Arxiv

0+阅读 · 2023年3月7日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】利用强化学习与生成模型推进可靠且可泛化的决策

美海军研发“增强侦察与态势评估系统（ARES）”应用程序以优化作战规划（附研究论文）

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

面向深度研究系统的强化学习基础：综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年3月9日

Cherry-Picking with Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Arxiv

0+阅读 · 2023年3月9日

Intent-based Deep Reinforcement Learning for Multi-agent Informative Path Planning

Arxiv

0+阅读 · 2023年3月9日

A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Learning Enhancement From Degradation: A Diffusion Model For Fundus Image Enhancement

Arxiv

0+阅读 · 2023年3月8日

On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Arxiv

0+阅读 · 2023年3月7日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

基于Budyko假说及GRACE重力卫星观测对流域水量平衡变化的多时间尺度研究

国家自然科学基金

0+阅读 · 2017年12月31日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

面向BYOD数据防护机制的多维脆弱性攻击研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于糖化合物“Ferrier Carbocyclization”汞离子荧光探针的设计、合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

高动态室内无线环境中渐进式自适应定位方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带放大器用Er3+/Ce3+共掺碲酸盐玻璃及光纤1.53μm波段辐射强度提高研究

国家自然科学基金

0+阅读 · 2011年12月31日

环境友好型磷酸盐胶凝材料机理与性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员