政策核心现象 (The Phenomenon of Policy Churn) - 专知论文

会员服务 ·

0

Learning · 贪心 · 可辨认的 · Atari · DQN ·

2022 年 6 月 9 日

The Phenomenon of Policy Churn

翻译：政策核心现象

Tom Schaul,André Barreto,John Quan,Georg Ostrovski

We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states within a handful of learning updates (in a typical deep RL set-up such as DQN on Atari). We characterise the phenomenon empirically, verifying that it is not limited to specific algorithm or environment properties. A number of ablations help whittle down the plausible explanations on why churn occurs to just a handful, all related to deep learning. Finally, we hypothesise that policy churn is a beneficial but overlooked form of implicit exploration that casts $\epsilon$-greedy exploration in a fresh light, namely that $\epsilon$-noise plays a much smaller role than expected.

翻译：我们发现并研究政策杂交现象,即贪婪政策在基于价值的强化学习中的迅速变化。政策杂交以惊人的快速速度运作,在少数的学习更新中(在典型的深度RL设置中,比如对Atari的DQN)改变大部分国家的贪婪行动。我们用经验来描述这种现象,核实它并不局限于特定的算法或环境特性。一些推理有助于减少关于为什么杂交发生于少数的、都与深层次的学习有关的合理解释。最后,我们假设政策杂交是一种有益但被忽视的隐含探索形式,在新的光线下进行“epsilon-greedy ” 的探索,即“$\epsilon-noise”的作用比预期的要小得多。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

专知会员服务

35+阅读 · 2020年5月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

GPC3嵌合抗原受体基因修饰的T细胞靶向治疗肝细胞癌的研究

国家自然科学基金

0+阅读 · 2014年12月31日

韧性城市卫生健康领域适应气候变化评价方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

箍筋约束ECC力学性能及应力-应变模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微分系统极限环、临界周期分支与非线性波方程行波解分支

国家自然科学基金

0+阅读 · 2009年12月31日

中国人RHD和RCE基因非编码区多态性研究

国家自然科学基金

0+阅读 · 2009年12月31日

关于1-Laplace型方程与平均曲率型方程的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Adding Neural Network Controllers to Behavior Trees without Destroying Performance Guarantees

Arxiv

0+阅读 · 2022年7月25日

Clustered Cell-Free Networking: A Graph Partitioning Approach

Arxiv

0+阅读 · 2022年7月24日

Investigating the Validity of Botometer-based Social Bot Studies

Arxiv

0+阅读 · 2022年7月23日

Learning for MPC with Stability & Safety Guarantees

Arxiv

1+阅读 · 2022年7月22日

Principal Geodesic Analysis of Merge Trees (and Persistence Diagrams)

Arxiv

0+阅读 · 2022年7月22日

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

Arxiv

0+阅读 · 2022年7月22日

A simple and sharper proof of the hypergraph Moore bound

Arxiv

0+阅读 · 2022年7月22日

Heuristic Rating Estimation Method for the incomplete pairwise comparisons matrices

Arxiv

0+阅读 · 2022年7月21日

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Arxiv

0+阅读 · 2022年7月21日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

专知会员服务

35+阅读 · 2020年5月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【AAAI2026】Align3GR：面向 LLM 生成式推荐的统一多层次对齐方法

多智能体强化学习中的稳健且高效的通信

【博士论文】通过判别式与生成式学习方法推进 3D场景理解

DeepSeek 实践：大模型部署、微调与应用

相关资讯

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Adding Neural Network Controllers to Behavior Trees without Destroying Performance Guarantees

Arxiv

0+阅读 · 2022年7月25日

Clustered Cell-Free Networking: A Graph Partitioning Approach

Arxiv

0+阅读 · 2022年7月24日

Investigating the Validity of Botometer-based Social Bot Studies

Arxiv

0+阅读 · 2022年7月23日

Learning for MPC with Stability & Safety Guarantees

Arxiv

1+阅读 · 2022年7月22日

Principal Geodesic Analysis of Merge Trees (and Persistence Diagrams)

Arxiv

0+阅读 · 2022年7月22日

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

Arxiv

0+阅读 · 2022年7月22日

A simple and sharper proof of the hypergraph Moore bound

Arxiv

0+阅读 · 2022年7月22日

Heuristic Rating Estimation Method for the incomplete pairwise comparisons matrices

Arxiv

0+阅读 · 2022年7月21日

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Arxiv

0+阅读 · 2022年7月21日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

GPC3嵌合抗原受体基因修饰的T细胞靶向治疗肝细胞癌的研究

国家自然科学基金

0+阅读 · 2014年12月31日

韧性城市卫生健康领域适应气候变化评价方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

箍筋约束ECC力学性能及应力-应变模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微分系统极限环、临界周期分支与非线性波方程行波解分支

国家自然科学基金

0+阅读 · 2009年12月31日

中国人RHD和RCE基因非编码区多态性研究

国家自然科学基金

0+阅读 · 2009年12月31日

关于1-Laplace型方程与平均曲率型方程的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员