软Q网络 (Soft Q Network) - 专知论文

会员服务 ·

0

SOFT · DQN · Networking · 策略改进 · 学成 ·

2020 年 12 月 14 日

翻译：软Q网络

Jingbin Liu,Shuai Liu,Xinyang Gu

Deep Q Network (DQN) is a very successful algorithm, yet the inherent problem of reinforcement learning, i.e. the exploit-explore balance, remains. In this work, we introduce entropy regularization into DQN and propose SQN. We find that the backup equation of soft Q learning can enjoy the corrective feedback if we view the soft backup as policy improvement in the form of Q, instead of policy evaluation. We show that Soft Q Learning with Corrective Feedback (SQL-CF) underlies the on-plicy nature of SQL and the equivalence of SQL and Soft Policy Gradient (SPG). With these insights, we propose an on-policy version of deep Q learning algorithm, i.e. Q On-Policy (QOP). We experiment with QOP on a self-play environment called Google Research Football (GRF). The QOP algorithm exhibits great stability and efficiency in training GRF agents.

翻译：深Q网络(DQN)是一个非常成功的算法,然而,强化学习的固有问题,即开发-爆炸平衡,仍然存在。在这项工作中,我们将加密正规化引入DQN并提议SQN。我们发现软Q学习的备份方程式可以享有纠正反馈,如果我们把软备份视为Q形式的政策改进,而不是政策评价。我们显示,软备份与纠正反馈(SQL-CF)的软学习是SQL(SQL-CF)的简单性质和SQL和软政策进步(SPG)的等同性的基础。根据这些见解,我们提出了深Q学习算法的政策版本,即QOP(Q-POL-Policy(QOP) 。我们与QOP(QOP)实验一个叫作谷歌研究足球(GRF)的自玩环境。QOP算法在培训GRF代理方面表现出极大的稳定性和效率。

0

相关内容

SOFT

神经网络不work？看下这份《训练神经网络实用技巧》，3页pdf

专知会员服务

56+阅读 · 2020年12月29日

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AAAI2020论文】GMAN：基于图多注意力网络的交通检测（GMAN: A Graph Multi-Attention Network for Traffic Prediction），范晓亮，戚建中等

【AAAI2020论文】GMAN：基于图多注意力网络的交通检测（GMAN: A Graph Multi-Attention Network for Traffic Prediction），范晓亮，戚建中等

专知会员服务

76+阅读 · 2019年11月22日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

SiMaN: Sign-to-Magnitude Network Binarization

Arxiv

0+阅读 · 2021年2月16日

Network of Tensor Time Series

Arxiv

5+阅读 · 2021年2月15日

Directed Graph Convolutional Network

Arxiv

3+阅读 · 2020年4月29日

CoCoNet: A Collaborative Convolutional Network

CoCoNet: A Collaborative Convolutional Network

Arxiv

6+阅读 · 2019年1月28日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Knowledge-enriched Two-layered Attention Network for Sentiment Analysis

Arxiv

4+阅读 · 2018年6月16日

Controllable Generative Adversarial Network

Arxiv

5+阅读 · 2018年5月1日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

神经网络不work？看下这份《训练神经网络实用技巧》，3页pdf

专知会员服务

56+阅读 · 2020年12月29日

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AAAI2020论文】GMAN：基于图多注意力网络的交通检测（GMAN: A Graph Multi-Attention Network for Traffic Prediction），范晓亮，戚建中等

【AAAI2020论文】GMAN：基于图多注意力网络的交通检测（GMAN: A Graph Multi-Attention Network for Traffic Prediction），范晓亮，戚建中等

专知会员服务

76+阅读 · 2019年11月22日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

SiMaN: Sign-to-Magnitude Network Binarization

Arxiv

0+阅读 · 2021年2月16日

Network of Tensor Time Series

Arxiv

5+阅读 · 2021年2月15日

Directed Graph Convolutional Network

Arxiv

3+阅读 · 2020年4月29日

CoCoNet: A Collaborative Convolutional Network

CoCoNet: A Collaborative Convolutional Network

Arxiv

6+阅读 · 2019年1月28日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Knowledge-enriched Two-layered Attention Network for Sentiment Analysis

Arxiv

4+阅读 · 2018年6月16日

Controllable Generative Adversarial Network

Arxiv

5+阅读 · 2018年5月1日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员