线性土匪中 " 最小程度最小化 " 实验设计 (Experimental Design for Regret Minimization in Linear Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 线性的 · state-of-the-art · 情景 · Bandits ·

2021 年 2 月 27 日

Experimental Design for Regret Minimization in Linear Bandits

翻译：线性土匪中 " 最小程度最小化 " 实验设计

Andrew Wagenmaker,Julian Katz-Samuels,Kevin Jamieson

In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to be suboptimal in many cases--our approach carefully plans which action to take by balancing the tradeoff between information gain and reward, overcoming the failures of optimism. In addition, we leverage tools from the theory of suprema of empirical processes to obtain regret guarantees that scale with the Gaussian width of the action set, avoiding wasteful union bounds. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime. In the combinatorial semi-bandit setting, we show that our algorithm is computationally efficient and relies only on calls to a linear maximization oracle. In addition, we show that with slight modification our algorithm can be used for pure exploration, obtaining state-of-the-art pure exploration guarantees in the semi-bandit setting. Finally, we provide, to the best of our knowledge, the first example where optimism fails in the semi-bandit regime, and show that in this setting our algorithm succeeds.

翻译：在本文中,我们提出一种新的实验性设计算法,以尽量减少在线随机线性和组合式强盗的遗憾。虽然现有文献倾向于侧重于基于乐观的算法,但在许多情况中,这些算法被证明是不最理想的。在组合半带式半带式设置中,我们表明我们的算法在计算上是有效的,只依赖于线性最大化或触摸。此外,我们通过对经验过程的想象性理论进行微小的修改,以获得规模的遗憾保证,使用高萨的动作宽度,避免浪费性结合界限。我们提供了最先进的有限时间保证,并表明我们的算法可以同时适用于土匪和半带状式反馈制度。在组合半带式半带式半带式设置中,我们表明我们的算法在计算上是有效的,只能依靠线性最大化或触觉的呼声。此外,我们证明我们的算法可以稍稍作修改后用于纯粹的探索,在半带式环境中获得最先进的纯度勘探保证。最后,我们为我们的知识提供了最佳的限定时间保证,并表明我们的知识可以同时同时应用。我们的第一个例子,即显示我们的乐观在半带式系统失败。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

67+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【ICML2020】图神经网络谱聚类

专知会员服务

42+阅读 · 2020年7月7日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

44+阅读 · 2020年1月1日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

112+阅读 · 2019年11月24日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

22+阅读 · 2019年11月11日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

79+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Upper and Lower Bounds for Deterministic Approximate Objects

Arxiv

0+阅读 · 2021年4月20日

Hypervolume-Optimal $μ$-Distributions on Line/Plane-based Pareto Fronts in Three Dimensions

Arxiv

0+阅读 · 2021年4月20日

Infinite GMRES for parameterized linear systems

Arxiv

0+阅读 · 2021年4月19日

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

Arxiv

0+阅读 · 2021年4月19日

On the implied weights of linear regression for causal inference

Arxiv

0+阅读 · 2021年4月19日

A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model

Arxiv

0+阅读 · 2021年4月18日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Deep Feature Aggregation with Heat Diffusion for Image Retrieval

Arxiv

4+阅读 · 2018年5月25日

Image Retrieval using Heat Diffusion for Deep Feature Aggregation

Arxiv

4+阅读 · 2018年5月22日

VIP会员

文章信息

相关主题

赌博机/老虎机

state-of-the-art

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

67+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【ICML2020】图神经网络谱聚类

专知会员服务

42+阅读 · 2020年7月7日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

44+阅读 · 2020年1月1日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

112+阅读 · 2019年11月24日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

22+阅读 · 2019年11月11日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

79+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

热门VIP内容

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Upper and Lower Bounds for Deterministic Approximate Objects

Arxiv

0+阅读 · 2021年4月20日

Hypervolume-Optimal $μ$-Distributions on Line/Plane-based Pareto Fronts in Three Dimensions

Arxiv

0+阅读 · 2021年4月20日

Infinite GMRES for parameterized linear systems

Arxiv

0+阅读 · 2021年4月19日

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

Arxiv

0+阅读 · 2021年4月19日

On the implied weights of linear regression for causal inference

Arxiv

0+阅读 · 2021年4月19日

A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model

Arxiv

0+阅读 · 2021年4月18日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Deep Feature Aggregation with Heat Diffusion for Image Retrieval

Arxiv

4+阅读 · 2018年5月25日

Image Retrieval using Heat Diffusion for Deep Feature Aggregation

Arxiv

4+阅读 · 2018年5月22日

微信扫码咨询专知VIP会员