超越最坏案例分析的背着背包的强盗 (Bandits with Knapsacks beyond the Worst-Case Analysis) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Extensibility · Bandits · Performer · SimPLe ·

2021 年 5 月 3 日

Bandits with Knapsacks beyond the Worst-Case Analysis

翻译：超越最坏案例分析的背着背包的强盗

Karthik Abinav Sankararaman,Aleksandrs Slivkins

from arxiv, The initial version, titled "Advances in Bandits with Knapsacks", was published on arxiv.org in Jan'20. The present version improves both upper and lower bounds, deriving Theorem 3.2(ii) and Theorem 4.2. Moreover, it simplifies the algorithm and analysis in the main result, and fixes several issues in the lower bounds

"Bandits with Knapsacks" (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider "simple regret" in BwK, which tracks the algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general template for extensions from bandits to BwK which takes advantage of some known helpful structure and apply this template to combinatorial semi-bandits and linear contextual bandits. Our results build on the BwK algorithm from (Agrawal and Devanur, 2014), providing new analyses thereof.

翻译：使用 Knapsacks (BwK) 的“ Bandits with Knapsacks” (BwK) 是多武装强盗在供应/预算限制下的一般模式。虽然对 BwK 最坏的遗憾范围非常理解, 但我们展示了三种结果, 超越了最坏情况的角度。首先, 我们提供上下界限, 相当于对数的完整描述, 取决于实例的遗憾率。其次, 我们考虑 BwK 中的“ 简单遗憾 ”, 以跟踪算法在特定回合中的表现, 并证明它除了几轮之外都是小的。第三, 我们为从强盗到 BwK 的扩展提供了一个一般模板, 利用一些已知的有用结构, 将这个模板用于组合半强盗和线性背景强盗。我们的结果建立在 BwK 算法( Agrawal and Devanur, 2014) 的基础上, 提供了新的分析。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【WWW2021 】洛伦兹图卷积神经网络

专知会员服务

41+阅读 · 2021年5月26日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

121+阅读 · 2020年11月20日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

44+阅读 · 2020年10月31日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

32+阅读 · 2020年4月26日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

54+阅读 · 2020年3月13日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

99+阅读 · 2020年2月8日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

Arxiv

0+阅读 · 2021年6月22日

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications

Arxiv

0+阅读 · 2021年6月22日

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Arxiv

0+阅读 · 2021年6月22日

On Limited-Memory Subsampling Strategies for Bandits

Arxiv

0+阅读 · 2021年6月21日

Exact Markov Chain-based Runtime Analysis of a Discrete Particle Swarm Optimization Algorithm on Sorting and OneMax

Arxiv

0+阅读 · 2021年6月20日

Variance-Dependent Best Arm Identification

Arxiv

0+阅读 · 2021年6月19日

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

Arxiv

0+阅读 · 2021年6月18日

Problem Dependent View on Structured Thresholding Bandit Problems

Problem Dependent View on Structured Thresholding Bandit Problems

Arxiv

0+阅读 · 2021年6月18日

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

Arxiv

0+阅读 · 2021年6月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【WWW2021 】洛伦兹图卷积神经网络

专知会员服务

41+阅读 · 2021年5月26日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

121+阅读 · 2020年11月20日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

44+阅读 · 2020年10月31日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

32+阅读 · 2020年4月26日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

54+阅读 · 2020年3月13日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

99+阅读 · 2020年2月8日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

Arxiv

0+阅读 · 2021年6月22日

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications

Arxiv

0+阅读 · 2021年6月22日

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Arxiv

0+阅读 · 2021年6月22日

On Limited-Memory Subsampling Strategies for Bandits

Arxiv

0+阅读 · 2021年6月21日

Exact Markov Chain-based Runtime Analysis of a Discrete Particle Swarm Optimization Algorithm on Sorting and OneMax

Arxiv

0+阅读 · 2021年6月20日

Variance-Dependent Best Arm Identification

Arxiv

0+阅读 · 2021年6月19日

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

Arxiv

0+阅读 · 2021年6月18日

Problem Dependent View on Structured Thresholding Bandit Problems

Problem Dependent View on Structured Thresholding Bandit Problems

Arxiv

0+阅读 · 2021年6月18日

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

Arxiv

0+阅读 · 2021年6月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

微信扫码咨询专知VIP会员