MDPs 平行蒸汽镜底 (Parallel Stochastic Mirror Descent for MDPs) - 专知论文

会员服务 ·

0

估计/估计量 · CASE · 生成模型 · 优化器 · Processing（编程语言） ·

2021 年 4 月 13 日

Parallel Stochastic Mirror Descent for MDPs

翻译：MDPs 平行蒸汽镜底

Daniil Tiapkin,Fedor Stonyakin,Alexander Gasnikov

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for average-reward MDPs with a generative model. One of the main features of the presented method is low communication costs in a distributed centralized setting.

翻译：我们考虑了学习无限象子Markov(MDPs)决策程序的最佳政策的问题。为此,针对利普施奇茨连续功能的细微编程问题,我们建议了Stochastic Mirror Spores的某种变种。一个重要的细节是使用功能限制的不精确值的能力。我们分析了一般情况下的这种算法,并获得了在方法运行期间没有累积错误的趋同率的估计值。我们使用这种算法,我们获得了具有基因模型的平均奖励 MDP的首种平行算法。所介绍的方法的主要特征之一是分布式集中环境中的低通信成本。

0

相关内容

估计/估计量

估计/估计量

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

已删除

将门创投

4+阅读 · 2020年6月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Neograd: Near-Ideal Gradient Descent

Neograd: Near-Ideal Gradient Descent

Arxiv

0+阅读 · 2021年6月7日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年6月7日

Mirror Descent Policy Optimization

Arxiv

0+阅读 · 2021年6月7日

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Arxiv

0+阅读 · 2021年6月6日

Federated Accelerated Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月5日

VIP会员

文章信息

相关主题

估计/估计量

Processing（编程语言）

相关VIP内容

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于AI的动态任务分配策略实现多智能体系统有意义人类控制》报告

《超越连接：AI驱动网络未来愿景》最新报告

人工智能赋能多域作战：能力与挑战

《战场空间决策优势：AI基础与应用研究》总结报告

相关资讯

已删除

将门创投

4+阅读 · 2020年6月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Neograd: Near-Ideal Gradient Descent

Neograd: Near-Ideal Gradient Descent

Arxiv

0+阅读 · 2021年6月7日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年6月7日

Mirror Descent Policy Optimization

Arxiv

0+阅读 · 2021年6月7日

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Arxiv

0+阅读 · 2021年6月6日

Federated Accelerated Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月5日

微信扫码咨询专知VIP会员