在 " 最遗憾最小化和斯托克强盗中最佳武器识别 " 的Pareto边界 (On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits) - 专知论文

会员服务 ·

0

Performer · 赌博机/老虎机 · 优化器 · ARM · Bandits ·

2021 年 10 月 16 日

On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits

翻译：在 " 最遗憾最小化和斯托克强盗中最佳武器识别 " 的Pareto边界

Zixin Zhong,Wang Chi Cheung,Vincent Y. F. Tan

from arxiv, 27 pages, 8 figures

We study the Pareto frontier of two archetypal objectives in stochastic bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To make this precise, we first design and analyze the BoBW-lil'UCB$({\gamma})$ algorithm, which achieves order-wise optimal performance for RM or BAI under different values of ${\gamma}$. Complementarily, we show that no algorithm can simultaneously perform optimally for both the RM and BAI objectives. More precisely, we establish non-trivial lower bounds on the regret achievable by any algorithm with a given BAI failure probability. This analysis shows that in some regimes BoBW-lil'UCB$({\gamma})$ achieves Pareto-optimality up to constant or small terms. Numerical experiments further demonstrate that when applied to difficult instances, BoBW-lil'UCB outperforms a close competitor UCB$_{\alpha}$ (Degenne et al., 2019), which is designed for RM and BAI with a fixed confidence.

翻译：我们研究了两大目标的Pareto边界,即最小化(RM)和最佳武器识别(BAI),具有固定的地平线。关于开采和勘探之间的平衡对于RM和BAI都至关重要,但勘探对于实现后一目标的最佳性能更为关键,我们研究的是Pareto边界线的边界线,我们首先设计和分析BoBW-lil'UB$(gamma})的算法,这种算法在美元的不同值下,使RM或BAI达到最优性能的一致。此外,我们表明,没有一种算法能够同时为RM和BAI的目标同时发挥最佳性能。更确切地说,我们对BOW-IL'UCB$(sgamma})在任何算法上都能实现的最遗憾程度的边缘线。我们的分析表明,在某些制度下,BBW-li'UCB$(s)中,在固定值或小值条件下,实现Preto-opyal 实验进一步证明,当应用困难实例时,BW-l'IAIAI'B(request)和ABCB)是固定的20CBCBCBCBRB。和固定和固定和正成型。

0

相关内容

Performer

【ICML2021】逆约束强化学习

专知会员服务

33+阅读 · 2021年9月7日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ICML2020-Tutorial】因果强化学习-CRL，147页ppt，哥伦比亚大学-Elias Bareinboim

【ICML2020-Tutorial】因果强化学习-CRL，147页ppt，哥伦比亚大学-Elias Bareinboim

专知会员服务

94+阅读 · 2020年7月16日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

Arxiv

0+阅读 · 2021年12月14日

The $f$-Divergence Reinforcement Learning Framework

Arxiv

0+阅读 · 2021年12月14日

Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization

Arxiv

0+阅读 · 2021年12月14日

Safe Linear Leveling Bandits

Arxiv

0+阅读 · 2021年12月13日

Zeroth-order Stochastic Compositional Algorithms for Risk-Aware Learning

Arxiv

0+阅读 · 2021年12月13日

Scheduling Servers with Stochastic Bilinear Rewards

Arxiv

0+阅读 · 2021年12月13日

Temporal Unit Interval Independent Sets

Arxiv

0+阅读 · 2021年12月12日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月10日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【ICML2021】逆约束强化学习

专知会员服务

33+阅读 · 2021年9月7日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ICML2020-Tutorial】因果强化学习-CRL，147页ppt，哥伦比亚大学-Elias Bareinboim

【ICML2020-Tutorial】因果强化学习-CRL，147页ppt，哥伦比亚大学-Elias Bareinboim

专知会员服务

94+阅读 · 2020年7月16日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】基础模型后训练的新方法

欧盟防务准备路线图：目标、冲突与2030之路（附“2030年防务准备路线图”原文）

【AAAI2026】模型不确定性下的在线鲁棒规划：一种基于采样的方法

Transformers 出现以来关系抽取任务的系统综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

Arxiv

0+阅读 · 2021年12月14日

The $f$-Divergence Reinforcement Learning Framework

Arxiv

0+阅读 · 2021年12月14日

Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization

Arxiv

0+阅读 · 2021年12月14日

Safe Linear Leveling Bandits

Arxiv

0+阅读 · 2021年12月13日

Zeroth-order Stochastic Compositional Algorithms for Risk-Aware Learning

Arxiv

0+阅读 · 2021年12月13日

Scheduling Servers with Stochastic Bilinear Rewards

Arxiv

0+阅读 · 2021年12月13日

Temporal Unit Interval Independent Sets

Arxiv

0+阅读 · 2021年12月12日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月10日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

微信扫码咨询专知VIP会员