不敢问:为预算的强盗提供依赖问题的保障 (Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 优化器 · 情景 · 约束 · 计算学习理论 ·

2021 年 10 月 12 日

Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits

翻译：不敢问:为预算的强盗提供依赖问题的保障

Nadav Merlis,Yonathan Efroni,Shie Mannor

We consider a stochastic multi-armed bandit setting where feedback is limited by a (possibly time-dependent) budget, and reward must be actively inquired for it to be observed. Previous works on this setting assumed a strict feedback budget and focused on not violating this constraint while providing problem-independent regret guarantees. In this work, we provide problem-dependent guarantees on both the regret and the asked feedback. In particular, we derive problem-dependent lower bounds on the required feedback and show that there is a fundamental difference between problems with a unique and multiple optimal arms. Furthermore, we present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds. Notably, we show that BuFALU naturally adapts to the number of optimal arms.

翻译：我们考虑的是一个复杂多武装的匪徒环境,其反馈受到(可能取决于时间)预算的限制,必须积极征求对反馈的注意。以前关于这一环境的工作假定了严格的反馈预算,侧重于不违反这一限制,同时提供问题独立的遗憾保证。在这项工作中,我们对遗憾和被询问的反馈都提供基于问题的保证。特别是,我们从所需的反馈中得出基于问题的较低界限,并表明独特和多种最佳武器的问题之间有着根本的区别。此外,我们提出了一种叫做BuFALU的新算法,我们从中得出了基于问题的遗憾和累积反馈。值得注意的是,我们表明,BuFALU自然会适应最佳武器的数量。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

已删除

将门创投

5+阅读 · 2019年8月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems

Arxiv

0+阅读 · 2021年12月10日

Discovering a set of policies for the worst case reward

Arxiv

0+阅读 · 2021年12月10日

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Arxiv

0+阅读 · 2021年12月6日

Nonstochastic Bandits with Composite Anonymous Feedback

Arxiv

0+阅读 · 2021年12月6日

AoI-Constrained Bandit: Information Gathering over Unreliable Channels with Age Guarantees

Arxiv

0+阅读 · 2021年12月6日

On Submodular Contextual Bandits

Arxiv

0+阅读 · 2021年12月3日

Towards Tractable Optimism in Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2021年12月3日

Distributed Adaptive Learning Under Communication Constraints

Arxiv

0+阅读 · 2021年12月3日

Convergence Guarantees for Deep Epsilon Greedy Policy Learning

Arxiv

0+阅读 · 2021年12月2日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

VIP会员

文章信息

相关主题

赌博机/老虎机

计算学习理论

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

已删除

将门创投

5+阅读 · 2019年8月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

相关论文

Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems

Arxiv

0+阅读 · 2021年12月10日

Discovering a set of policies for the worst case reward

Arxiv

0+阅读 · 2021年12月10日

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Arxiv

0+阅读 · 2021年12月6日

Nonstochastic Bandits with Composite Anonymous Feedback

Arxiv

0+阅读 · 2021年12月6日

AoI-Constrained Bandit: Information Gathering over Unreliable Channels with Age Guarantees

Arxiv

0+阅读 · 2021年12月6日

On Submodular Contextual Bandits

Arxiv

0+阅读 · 2021年12月3日

Towards Tractable Optimism in Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2021年12月3日

Distributed Adaptive Learning Under Communication Constraints

Arxiv

0+阅读 · 2021年12月3日

Convergence Guarantees for Deep Epsilon Greedy Policy Learning

Arxiv

0+阅读 · 2021年12月2日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

微信扫码咨询专知VIP会员