政策优化与肉皮镜源 (Policy Optimization with Stochastic Mirror Descent) - 专知论文

会员服务 ·

0

Extensibility · 样本 · 优化器 · 样本复杂度 · 驻点 ·

2021 年 12 月 3 日

Policy Optimization with Stochastic Mirror Descent

翻译：政策优化与肉皮镜源

Long Yang,Yu Zhang,Gang Zheng,Qian Zheng,Pengfei Li,Jun Wen,Gang Pan

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed $\mathtt{VRMPO}$ needs only $\mathcal{O}(\epsilon^{-3})$ sample trajectories to achieve an $\epsilon$-approximate first-order stationary point, which matches the best sample complexity for policy optimization. The extensive experimental results demonstrate that $\mathtt{VRMPO}$ outperforms the state-of-the-art policy gradient methods in various settings.

翻译：提高抽样效率一直是加强学习的一个长期目标。本文提出了 $\ matht{ VRMPO} 算法 : 一种具有随机镜像底部的抽样有效政策梯度方法。在 $\ matht{ VRMPO} $ 中, 提出了一个新的差异变换政策梯度估计值, 以提高抽样效率。我们证明, $\ matht{ VRMPO} 的建议只需要 $\ mathcal{ O} (\ psilon}-3} ) 美元样本轨迹来实现 $\ explon$- 近似第一阶固定点, 与政策优化的最佳样本复杂性相匹配。广泛的实验结果表明, $\ matht{ VRMPO} 显示, $matht{ VRMPO} 美元在各种环境下都比最先进的政策梯度方法要好。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

59+阅读 · 2020年11月21日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

93+阅读 · 2020年7月10日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Policy Optimization for Stochastic Shortest Path

Arxiv

0+阅读 · 2022年2月7日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2022年2月7日

Convergence of a robust deep FBSDE method for stochastic control

Arxiv

0+阅读 · 2022年2月6日

Optimal Algorithms for Decentralized Stochastic Variational Inequalities

Arxiv

0+阅读 · 2022年2月6日

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

Arxiv

0+阅读 · 2022年2月3日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

59+阅读 · 2020年11月21日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

93+阅读 · 2020年7月10日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】基础模型后训练的新方法

欧盟防务准备路线图：目标、冲突与2030之路（附“2030年防务准备路线图”原文）

【AAAI2026】模型不确定性下的在线鲁棒规划：一种基于采样的方法

Transformers 出现以来关系抽取任务的系统综述

相关资讯

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Policy Optimization for Stochastic Shortest Path

Arxiv

0+阅读 · 2022年2月7日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2022年2月7日

Convergence of a robust deep FBSDE method for stochastic control

Arxiv

0+阅读 · 2022年2月6日

Optimal Algorithms for Decentralized Stochastic Variational Inequalities

Arxiv

0+阅读 · 2022年2月6日

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

Arxiv

0+阅读 · 2022年2月3日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

微信扫码咨询专知VIP会员