Bregman在正规化的Markov决策过程中的分歧和价值之间的联系 (On the connection between Bregman divergence and value in regularized Markov decision processes) - 专知论文

会员服务 ·

0

Markov · 正则化项 · 散度 · Processing（编程语言） · 泛函 ·

2022 年 10 月 28 日

On the connection between Bregman divergence and value in regularized Markov decision processes

翻译：Bregman在正规化的Markov决策过程中的分歧和价值之间的联系

Brendan O'Donoghue

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

翻译：在这个简短的注释中,我们得出了布雷格曼人从现行政策到最佳政策的分歧与正常的马尔科夫决策程序中当前价值功能的不优化之间的关系。这一结果对多任务强化学习、离线强化学习和功能近似法下的遗憾分析等产生了影响。

0

相关内容

Markov

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

B与H离子共注入剥离SiC晶体波导特性的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于SiPM的高性能In-Beam TOF-PET的研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土离子与Si注入ZnO晶体的荧光加强特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

气溶胶谱分布地基反演算法改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

二亚硝基哌嗪（DNP）介导Clusterin表达参与鼻咽癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体量子点阵列中狄拉克电子特性调控的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Learning and Extrapolation of Robotic Skills using Task-Parameterized Equation Learner Networks

Arxiv

0+阅读 · 2022年12月16日

Parameter estimation of the homodyned K distribution based on neural networks and trainable fractional-order moments

Arxiv

0+阅读 · 2022年12月16日

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Arxiv

0+阅读 · 2022年12月15日

Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet

Arxiv

0+阅读 · 2022年12月15日

Combining information-seeking exploration and reward maximization: Unified inference on continuous state and action spaces under partial observability

Arxiv

0+阅读 · 2022年12月15日

SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Arxiv

0+阅读 · 2022年12月15日

Reinforcement Learning in System Identification

Arxiv

0+阅读 · 2022年12月14日

Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Arxiv

0+阅读 · 2022年12月14日

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

Arxiv

0+阅读 · 2022年12月13日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美军条令：小部队指挥官山地作战指南》最新238页

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《俄乌战争中俄罗斯两栖作战能力：黑海舰队战力与作战失利研究》2025年最新111页

《任务式指挥十六个案例研究》232页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Learning and Extrapolation of Robotic Skills using Task-Parameterized Equation Learner Networks

Arxiv

0+阅读 · 2022年12月16日

Parameter estimation of the homodyned K distribution based on neural networks and trainable fractional-order moments

Arxiv

0+阅读 · 2022年12月16日

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Arxiv

0+阅读 · 2022年12月15日

Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet

Arxiv

0+阅读 · 2022年12月15日

Combining information-seeking exploration and reward maximization: Unified inference on continuous state and action spaces under partial observability

Arxiv

0+阅读 · 2022年12月15日

SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Arxiv

0+阅读 · 2022年12月15日

Reinforcement Learning in System Identification

Arxiv

0+阅读 · 2022年12月14日

Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Arxiv

0+阅读 · 2022年12月14日

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

Arxiv

0+阅读 · 2022年12月13日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

相关基金

B与H离子共注入剥离SiC晶体波导特性的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于SiPM的高性能In-Beam TOF-PET的研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土离子与Si注入ZnO晶体的荧光加强特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

气溶胶谱分布地基反演算法改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

二亚硝基哌嗪（DNP）介导Clusterin表达参与鼻咽癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体量子点阵列中狄拉克电子特性调控的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员