促进离岸政策优化的保守的贝耶斯基于示范价值的扩大</s> (Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization) - 专知论文

会员服务 ·

0

Performer · Learning · 优化器 · 估计/估计量 · MoDELS ·

2023 年 3 月 3 日

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

翻译：促进离岸政策优化的保守的贝耶斯基于示范价值的扩大

Jihwan Jeong,Xiaoyu Wang,Michael Gimelfarb,Hyunwoo Kim,Baher Abdulhai,Scott Sanner

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since they can extract more learning signals from the logged dataset by learning a model of the environment. However, the performance of existing model-based approaches falls short of model-free counterparts, due to the compounding of estimation errors in the learned model. Driven by this observation, we argue that it is critical for a model-based method to understand when to trust the model and when to rely on model-free estimates, and how to act conservatively w.r.t. both. To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and model-based estimates during the policy evaluation step according to their epistemic uncertainties, and facilitates conservatism by taking a lower bound on the Bayesian posterior value estimate. On the standard D4RL continuous control tasks, we find that our method significantly outperforms previous model-based approaches: e.g., MOPO by $116.4$%, MOReL by $23.2$% and COMBO by $23.7$%. Further, CBOP achieves state-of-the-art performance on $11$ out of $18$ benchmark datasets while doing on par on the remaining datasets.

翻译：离线强化学习(RL) 解决了从根据某些行为政策收集的固定数据中学习执行政策的问题。基于模型的方法在离线设置中特别具有吸引力,因为它们可以通过学习环境模型从登录数据集中提取更多的学习信号。然而,现有基于模型的方法的效绩低于不使用模型的对应方,原因是所学模型中估算错误的复合性。受这一观察的驱使,我们认为,基于模型的方法对于了解何时信任模型和何时依赖不使用模型的估计数以及如何保守地采取W.r.t的估计数至关重要。为此,我们推出一种优雅而简单的方法,称为保守的Bayesian基于模型的值扩大,用于离线政策优化(CBOP),在政策评价步骤中,由于对所学模型的不确定性的复合性差错进行交换,并且通过对Bayesian posior值的估计数采取较低的约束度,我们的方法在标准 D4RL 持续控制任务上大大超出R.r.tal $ 保守的W.r.t。为此,我们发现我们的方法大大超越了以保守的Base-rbas-al MO2.BAS 之前的数据BOBAS 。</s>

0

相关内容

Performer

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

79+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

梯度纳米结构金属的应变硬化行为及微观机理

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向高动态纳米测量的光栅莫尔条纹信号误差机理及实时补偿研究

国家自然科学基金

0+阅读 · 2013年12月31日

微生物硫化浮选低品位氧化型镍矿界面作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Nodal-ALK7介导的β细胞内源性调节对β细胞功能的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月25日

Deep learning based Auto Tuning for Database Management System

Arxiv

0+阅读 · 2023年4月25日

Design optimization for high-performance computing using FPGA

Arxiv

0+阅读 · 2023年4月24日

Generalized Bayesian Likelihood-Free Inference

Arxiv

0+阅读 · 2023年4月24日

Designing Optimal Personalized Incentive for Traffic Routing using BIG Hype algorithm

Arxiv

0+阅读 · 2023年4月24日

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition

Arxiv

0+阅读 · 2023年4月24日

Guarded Policy Optimization with Imperfect Online Demonstrations

Arxiv

0+阅读 · 2023年4月24日

Policy Learning under Biased Sample Selection

Arxiv

0+阅读 · 2023年4月23日

An Index Policy for Minimizing the Uncertainty-of-Information of Markov Sources

Arxiv

0+阅读 · 2023年4月22日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

79+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月25日

Deep learning based Auto Tuning for Database Management System

Arxiv

0+阅读 · 2023年4月25日

Design optimization for high-performance computing using FPGA

Arxiv

0+阅读 · 2023年4月24日

Generalized Bayesian Likelihood-Free Inference

Arxiv

0+阅读 · 2023年4月24日

Designing Optimal Personalized Incentive for Traffic Routing using BIG Hype algorithm

Arxiv

0+阅读 · 2023年4月24日

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition

Arxiv

0+阅读 · 2023年4月24日

Guarded Policy Optimization with Imperfect Online Demonstrations

Arxiv

0+阅读 · 2023年4月24日

Policy Learning under Biased Sample Selection

Arxiv

0+阅读 · 2023年4月23日

An Index Policy for Minimizing the Uncertainty-of-Information of Markov Sources

Arxiv

0+阅读 · 2023年4月22日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

梯度纳米结构金属的应变硬化行为及微观机理

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向高动态纳米测量的光栅莫尔条纹信号误差机理及实时补偿研究

国家自然科学基金

0+阅读 · 2013年12月31日

微生物硫化浮选低品位氧化型镍矿界面作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Nodal-ALK7介导的β细胞内源性调节对β细胞功能的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员