通过不确定性的估算,进行高效的深强化学习 (Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 深度强化学习 · 样本 · 学成 · 优化器 ·

2022 年 5 月 3 日

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

翻译：通过不确定性的估算,进行高效的深强化学习

Vincent Mai,Kaustubh Mani,Liam Paull

from arxiv, ICLR 2022 [Spotlight]

In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. As this noise is heteroscedastic, its effects can be mitigated using uncertainty-based weights in the optimization process. Previous methods rely on sampled ensembles, which do not capture all aspects of uncertainty. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL, and introduce inverse-variance RL, a Bayesian framework which combines probabilistic ensembles and Batch Inverse Variance weighting. We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environment stochasticity to better mitigate the negative impacts of noisy supervision. Our results show significant improvement in terms of sample efficiency on discrete and continuous control tasks.

翻译：在无模型深度强化学习(RL)算法中,使用噪音价值估计来监督政策评价和优化,对抽样效率有害。由于这种噪音是杂交的,因此在优化过程中可以使用基于不确定性的权重来减轻其影响。以前的方法依靠抽样组合,并不包含不确定性的方方面面。我们系统地分析在RL中出现的噪音监督中的不确定性来源,并引入逆差RL,即巴伊西亚框架,将概率组合和批量反差异权重结合起来。我们提出了一个方法,即两种互补的不确定性估计方法既考虑到Q值,又考虑到环境的随机性,以便更好地减轻噪音监督的负面影响。我们的结果显示,离散和连续控制任务的抽样效率有了显著提高。

0

相关内容

估计/估计量

估计/估计量

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

专知会员服务

40+阅读 · 2022年3月19日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

TRAF3IP3调控T细胞活性与肿瘤免疫的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

pORF5与Pin1相互作用对沙眼衣原体生长发育的影响及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

c-MET信号通路与肝癌耐药机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

高效DC细胞靶向性DNA疫苗抗HPV感染的免疫效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

铂族金属纳米颗粒的形貌与其不对称催化氢化性能的构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

CEA联合4-1BBL基因疫苗抗大肠癌的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

乙肝疫苗无应答复种后SLAM-SAP免疫调控机制及免疫细胞相互作用

国家自然科学基金

0+阅读 · 2011年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年6月19日

Coded Caching via Federated Deep Reinforcement Learning in Fog Radio Access Networks

Arxiv

0+阅读 · 2022年6月19日

Intelligent Blockchain-based Edge Computing via Deep Reinforcement Learning: Solutions and Challenges

Arxiv

0+阅读 · 2022年6月17日

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Smoothing Policies and Safe Policy Gradients

Arxiv

0+阅读 · 2022年6月17日

Dynamic Split Computing for Efficient Deep Edge Intelligence

Dynamic Split Computing for Efficient Deep Edge Intelligence

Arxiv

0+阅读 · 2022年6月17日

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

VIP会员

文章信息

相关主题

估计/估计量

深度强化学习

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

专知会员服务

40+阅读 · 2022年3月19日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年6月19日

Coded Caching via Federated Deep Reinforcement Learning in Fog Radio Access Networks

Arxiv

0+阅读 · 2022年6月19日

Intelligent Blockchain-based Edge Computing via Deep Reinforcement Learning: Solutions and Challenges

Arxiv

0+阅读 · 2022年6月17日

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Smoothing Policies and Safe Policy Gradients

Arxiv

0+阅读 · 2022年6月17日

Dynamic Split Computing for Efficient Deep Edge Intelligence

Dynamic Split Computing for Efficient Deep Edge Intelligence

Arxiv

0+阅读 · 2022年6月17日

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

相关基金

TRAF3IP3调控T细胞活性与肿瘤免疫的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

pORF5与Pin1相互作用对沙眼衣原体生长发育的影响及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

c-MET信号通路与肝癌耐药机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

高效DC细胞靶向性DNA疫苗抗HPV感染的免疫效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

铂族金属纳米颗粒的形貌与其不对称催化氢化性能的构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

CEA联合4-1BBL基因疫苗抗大肠癌的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

乙肝疫苗无应答复种后SLAM-SAP免疫调控机制及免疫细胞相互作用

国家自然科学基金

0+阅读 · 2011年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员