强化学习中的连续时间q学习 (q-Learning in Continuous Time) - 专知论文

会员服务 ·

0

Q学习 · Q函数 · 算法 · PG · 离散 ·

2023 年 4 月 18 日

q-Learning in Continuous Time

翻译：强化学习中的连续时间q学习

Yanwei Jia,Xun Yu Zhou

from arxiv, 64 pages, 4 figures

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.

翻译：我们研究了Wang等人(2020)引入的在熵正则化下的探索性扩散过程框架下的强化学习(q学习)的连续时间版本。由于传统的Q函数在连续时间中崩溃，因此我们考虑其一阶近似，并称之为“小q函数”。这个函数与瞬时优势率函数以及哈密顿量有关。我们围绕q函数开发了一个与时间离散化无关的“q学习”理论。在一定的随机策略下，我们借助某些随机过程的鞅条件共同刻画了相应的q函数和值函数，既适用于on-policy，也适用于off-policy设置。然后，我们应用该理论，根据从q函数生成的Gibbs测度的密度函数是否能够被显式计算来设计不同的演员-评论家算法以解决潜在的强化学习问题。我们的一种算法解释了SARSA的著名Q学习算法，另一种则恢复了Jia和Zhou(2022b)提出的基于策略梯度(PG)的连续时间算法。最后，我们进行模拟实验，将我们的算法与Jia和Zhou(2022b)的基于PG的算法以及时间离散的传统Q学习算法进行了比较。

0

相关内容

Q学习

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

专知会员服务

45+阅读 · 2023年1月17日

斯坦福大学最新【强化学习】2022课程，含ppt

斯坦福大学最新【强化学习】2022课程，含ppt

专知会员服务

131+阅读 · 2022年2月27日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

77+阅读 · 2021年1月23日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

【伯克利博士论文】如何让机器人多技能？通过最大熵强化学习(107页pdf)

【伯克利博士论文】如何让机器人多技能？通过最大熵强化学习(107页pdf)

专知会员服务

78+阅读 · 2019年10月27日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

无限闭凸集族凸可行性问题中投影算法的线性收敛

国家自然科学基金

0+阅读 · 2015年12月31日

刚性微分方程高阶隐式离散解的快速迭代算法

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下强化学习和决策的神经机制

国家自然科学基金

11+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

时域连续的高维Monte Carlo绘制技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

编织C/SiC复合材料的各向异性损伤演化及本构关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

Explore to Generalize in Zero-Shot RL

Arxiv

0+阅读 · 2023年6月5日

Automated Importance Sampling via Optimal Control for Stochastic Reaction Networks: A Markovian Projection-based Approach

Arxiv

0+阅读 · 2023年6月5日

Active Reinforcement Learning under Limited Visual Observability

Arxiv

0+阅读 · 2023年6月1日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Policy Optimization for Continuous Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

On Preemption and Learning in Stochastic Scheduling

On Preemption and Learning in Stochastic Scheduling

Arxiv

0+阅读 · 2023年6月1日

Normalization Enhances Generalization in Visual Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

相关VIP内容

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

专知会员服务

45+阅读 · 2023年1月17日

斯坦福大学最新【强化学习】2022课程，含ppt

斯坦福大学最新【强化学习】2022课程，含ppt

专知会员服务

131+阅读 · 2022年2月27日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

77+阅读 · 2021年1月23日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

【伯克利博士论文】如何让机器人多技能？通过最大熵强化学习(107页pdf)

【伯克利博士论文】如何让机器人多技能？通过最大熵强化学习(107页pdf)

专知会员服务

78+阅读 · 2019年10月27日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《具备集体态势感知能力的深度强化学习智能体在超视距空战中的应用研究》最新文献

《美军条令文件：频谱管理操作技术》2025最新100页

反制小型无人机：一项重大挑战

《AI作战：将人机协作集成至实时、虚拟与建构环境（LVC）的建模与仿真》

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Explore to Generalize in Zero-Shot RL

Arxiv

0+阅读 · 2023年6月5日

Automated Importance Sampling via Optimal Control for Stochastic Reaction Networks: A Markovian Projection-based Approach

Arxiv

0+阅读 · 2023年6月5日

Active Reinforcement Learning under Limited Visual Observability

Arxiv

0+阅读 · 2023年6月1日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Policy Optimization for Continuous Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

On Preemption and Learning in Stochastic Scheduling

On Preemption and Learning in Stochastic Scheduling

Arxiv

0+阅读 · 2023年6月1日

Normalization Enhances Generalization in Visual Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

相关基金

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

无限闭凸集族凸可行性问题中投影算法的线性收敛

国家自然科学基金

0+阅读 · 2015年12月31日

刚性微分方程高阶隐式离散解的快速迭代算法

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下强化学习和决策的神经机制

国家自然科学基金

11+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

时域连续的高维Monte Carlo绘制技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

编织C/SiC复合材料的各向异性损伤演化及本构关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员