Thompson 以信息放松处罚抽取样本 (Thompson Sampling with Information Relaxation Penalties) - 专知论文

会员服务 ·

0

Performer · INFORMS · 样本 · CC · 赌博机/老虎机 ·

2021 年 6 月 16 日

Thompson Sampling with Information Relaxation Penalties

翻译：Thompson 以信息放松处罚抽取样本

Seungki Min,Costis Maglaras,Ciamac C. Moallemi

We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework. With this framework, we define an intuitive family of control policies that include Thompson sampling (TS) and the Bayesian optimal policy as endpoints. Analogous to TS, which, at each decision epoch pulls an arm that is best with respect to the randomly sampled parameters, our algorithms sample entire future reward realizations and take the corresponding best action. However, this is done in the presence of "penalties" that seek to compensate for the availability of future information. We develop several novel policies and performance bounds for MAB problems that vary in terms of improving performance and increasing computational complexity between the two endpoints. Our policies can be viewed as natural generalizations of TS that simultaneously incorporate knowledge of the time horizon and explicitly consider the exploration-exploitation trade-off. We prove associated structural results on performance bounds and suboptimality gaps. Numerical experiments suggest that this new class of policies perform well, in particular in settings where the finite time horizon introduces significant exploration-exploitation tension into the problem. Finally, inspired by the finite-horizon Gittins index, we propose an index policy that builds on our framework that particularly outperforms the state-of-the-art algorithms in our numerical experiments.

翻译：我们认为,在巴伊西亚环境下,我们考虑的是一种限值和多武装土匪(MAB)问题,为此,我们提议了一个信息放松抽样框架。在这个框架内,我们定义了一套直观的控制政策,包括Thompson抽样(TS)和巴伊西亚最佳政策,作为终点。与TS相比,我们的政策可以被视为TS的自然概括,它既包含对时间范围的知识,又明确考虑探索-开发交易。我们在业绩约束和次优化差距上证明了相关的结构结果。数字实验表明,这一新政策类别运行良好,特别是在我们不断变异的轨迹上,我们最终在探索时平面上展示了我们不确定的时间框架。

0

相关内容

Performer

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

最新《生成式对抗网络数学导论》，30页pdf

最新《生成式对抗网络数学导论》，30页pdf

专知会员服务

79+阅读 · 2020年9月3日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Non-Asymptotic Bounds for the $\ell_{\infty}$ Estimator in Linear Regression with Uniform Noise

Arxiv

0+阅读 · 2021年8月17日

Efficient Hyperparameter Optimization under Multi-Source Covariate Shift

Arxiv

0+阅读 · 2021年8月16日

Batched Thompson Sampling for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年8月15日

Adaptive optimal estimation of irregular mean and covariance functions

Arxiv

0+阅读 · 2021年8月14日

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Arxiv

0+阅读 · 2021年8月13日

Parameter-Based Value Functions

Arxiv

0+阅读 · 2021年8月13日

Deterministic factoring with oracles

Arxiv

0+阅读 · 2021年8月13日

Asymptotic optimality and minimal complexity of classification by random projection

Arxiv

0+阅读 · 2021年8月11日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

最新《生成式对抗网络数学导论》，30页pdf

最新《生成式对抗网络数学导论》，30页pdf

专知会员服务

79+阅读 · 2020年9月3日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《具备集体态势感知能力的深度强化学习智能体在超视距空战中的应用研究》最新文献

《美军条令文件：频谱管理操作技术》2025最新100页

反制小型无人机：一项重大挑战

《AI作战：将人机协作集成至实时、虚拟与建构环境（LVC）的建模与仿真》

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Non-Asymptotic Bounds for the $\ell_{\infty}$ Estimator in Linear Regression with Uniform Noise

Arxiv

0+阅读 · 2021年8月17日

Efficient Hyperparameter Optimization under Multi-Source Covariate Shift

Arxiv

0+阅读 · 2021年8月16日

Batched Thompson Sampling for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年8月15日

Adaptive optimal estimation of irregular mean and covariance functions

Arxiv

0+阅读 · 2021年8月14日

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Arxiv

0+阅读 · 2021年8月13日

Parameter-Based Value Functions

Arxiv

0+阅读 · 2021年8月13日

Deterministic factoring with oracles

Arxiv

0+阅读 · 2021年8月13日

Asymptotic optimality and minimal complexity of classification by random projection

Arxiv

0+阅读 · 2021年8月11日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员