NeurWIN:无休止强盗通过深RL的神经Whattle指数网络 (NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Performer · Networking · 优化器 · Bandits ·

2021 年 10 月 5 日

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

翻译：NeurWIN:无休止强盗通过深RL的神经Whattle指数网络

Khaled Nakhleh,Santosh Ganji,Ping-Chun Hsieh,I-Hong Hou,Srinivas Shakkottai

Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is significantly better than other RL algorithms.

翻译：Whittle 指数政策是获得对臭名昭著的无动于衷的土匪问题无异的最佳最佳解决办法的有力工具。然而,找到Whittle指数对于许多实际的无动于衷的土匪来说仍然是一个棘手的问题。本文提议了NeurWIN,这是一个神经惠特尔指数网络,通过利用Whittle指数的数学特性,为任何无动于衷的土匪学习Whittle指数。我们显示,生成Whittle指数的神经网络也是对一组Markov决策问题产生最佳控制的神经网络。这一属性利用深入强化学习对NeurWIN进行训练的动力。我们通过评估NeurWINWIN对最近研究的3个无动于衷的土匪问题的表现,展示了NeurWIN的效用。我们的实验结果表明,NeurWIN的性能比其他RL算法要好得多。

0

相关内容

赌博机/老虎机

赌博机/老虎机

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

专知会员服务

10+阅读 · 2019年12月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule

Arxiv

1+阅读 · 2021年11月25日

Robust Differentiable SVD

Arxiv

9+阅读 · 2021年4月8日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Deep Convolutional Networks as shallow Gaussian Processes

Arxiv

4+阅读 · 2018年8月16日

DARTS: Differentiable Architecture Search

Arxiv

3+阅读 · 2018年6月24日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Learning to Extract Coherent Summary via Deep Reinforcement Learning

Arxiv

6+阅读 · 2018年4月19日

Cache-Enabled Dynamic Rate Allocation via Deep Self-Transfer Reinforcement Learning

Arxiv

4+阅读 · 2018年3月30日

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Arxiv

3+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

专知会员服务

10+阅读 · 2019年12月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule

Arxiv

1+阅读 · 2021年11月25日

Robust Differentiable SVD

Arxiv

9+阅读 · 2021年4月8日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Deep Convolutional Networks as shallow Gaussian Processes

Arxiv

4+阅读 · 2018年8月16日

DARTS: Differentiable Architecture Search

Arxiv

3+阅读 · 2018年6月24日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Learning to Extract Coherent Summary via Deep Reinforcement Learning

Arxiv

6+阅读 · 2018年4月19日

Cache-Enabled Dynamic Rate Allocation via Deep Self-Transfer Reinforcement Learning

Arxiv

4+阅读 · 2018年3月30日

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Arxiv

3+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员