关于最佳政策优化政策的重尾梯度 (On Proximal Policy Optimization's Heavy-tailed Gradients) - 专知论文

会员服务 ·

0

估计/估计量 · 最优化 · 稳健性 · 梯度截断 · Continuity ·

2021 年 7 月 13 日

On Proximal Policy Optimization's Heavy-tailed Gradients

翻译：关于最佳政策优化政策的重尾梯度

Saurabh Garg,Joshua Zhanson,Emilio Parisotto,Adarsh Prasad,J. Zico Kolter,Zachary C. Lipton,Sivaraman Balakrishnan,Ruslan Salakhutdinov,Pradeep Ravikumar

from arxiv, ICML 2021

Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function. We demonstrate that the gradients, especially for the actor network, exhibit pronounced heavy-tailedness and that it increases as the agent's policy diverges from the behavioral policy (i.e., as the agent goes further off policy). Further examination implicates the likelihood ratios and advantages in the surrogate reward as the main sources of the observed heavy-tailedness. We then highlight issues arising due to the heavy-tailed nature of the gradients. In this light, we study the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavy-tailedness in gradients. Thus motivated, we propose incorporating GMOM, a high-dimensional robust estimator, into PPO as a substitute for three clipping tricks. Despite requiring less hyperparameter tuning, our method matches the performance of PPO (with all heuristics enabled) on a battery of MuJoCo continuous control tasks.

翻译：Proximal政策优化等现代政策梯度算法取决于一系列的累赘学,包括损失剪切和梯度剪切,以确保成功学习。这些累赘学是来自强健统计的技术的象征,这些技术通常用于外部富裕(“重折尾”)制度的估计。在本文件中,我们提出详细的实证研究,以说明PPO代谢奖励功能的梯度的重尾性质。我们表明,梯度,特别是行为者网络的梯度,显示出明显的超尾和梯度,随着代理人的政策与行为政策(即代理人越走越远)不同而增加。进一步的研究将超尾奖奖励的可能性和优势作为观察到的重尾尾尾裁的主要来源。我们然后强调由于梯度的重尾伸性质而产生的问题。我们研究了标准PPO对超额报酬的影响,我们研究了这些骗局,表明这些骗局主要用来抵消高尾调的GMO(即代理人更进一步偏离政策),因此,我们提议采用高尾调高尾调的GMPO。

0

相关内容

估计/估计量

估计/估计量

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

专知会员服务

53+阅读 · 2020年2月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

专知会员服务

65+阅读 · 2019年8月8日

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

深度强化学习实验室

3+阅读 · 2020年3月15日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

DROMO: Distributionally Robust Offline Model-based Policy Optimization

Arxiv

0+阅读 · 2021年9月15日

From Sampling to Optimization on Discrete Domains with Applications to Determinant Maximization

Arxiv

0+阅读 · 2021年9月15日

Tail bounds for empirically standardized sums

Arxiv

0+阅读 · 2021年9月13日

On Empirical Risk Minimization with Dependent and Heavy-Tailed Data

Arxiv

0+阅读 · 2021年9月10日

Robot Navigation in Irregular Environments with Local Elevation Estimation using Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年9月10日

Estimation and Adaption of Indoor Ego Airflow Disturbance with Application to Quadrotor Trajectory Planning

Estimation and Adaption of Indoor Ego Airflow Disturbance with Application to Quadrotor Trajectory Planning

Arxiv

0+阅读 · 2021年9月10日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

专知会员服务

53+阅读 · 2020年2月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

专知会员服务

65+阅读 · 2019年8月8日

热门VIP内容

开通专知VIP会员享更多权益服务

《现代战争人工智能：在不确定性格局中驾驭伦理决策机制的复杂性》

AI CITY发展研究报告：“人工智能+”时代的智慧城市发展范式创新（2025年）

《基于深度学习模型的图像军事目标检测》

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

相关资讯

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

深度强化学习实验室

3+阅读 · 2020年3月15日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

DROMO: Distributionally Robust Offline Model-based Policy Optimization

Arxiv

0+阅读 · 2021年9月15日

From Sampling to Optimization on Discrete Domains with Applications to Determinant Maximization

Arxiv

0+阅读 · 2021年9月15日

Tail bounds for empirically standardized sums

Arxiv

0+阅读 · 2021年9月13日

On Empirical Risk Minimization with Dependent and Heavy-Tailed Data

Arxiv

0+阅读 · 2021年9月10日

Robot Navigation in Irregular Environments with Local Elevation Estimation using Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年9月10日

Estimation and Adaption of Indoor Ego Airflow Disturbance with Application to Quadrotor Trajectory Planning

Estimation and Adaption of Indoor Ego Airflow Disturbance with Application to Quadrotor Trajectory Planning

Arxiv

0+阅读 · 2021年9月10日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员