多武装强盗要求单调式枪械序列 (Multi-armed Bandit Requiring Monotone Arm Sequences) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 目标函数 · ARM · Lipschitz连续 · 优化器 ·

2021 年 6 月 7 日

Multi-armed Bandit Requiring Monotone Arm Sequences

翻译：多武装强盗要求单调式枪械序列

In many online learning or multi-armed bandit problems, the taken actions or pulled arms are ordinal and required to be monotone over time. Examples include dynamic pricing, in which the firms use markup pricing policies to please early adopters and deter strategic waiting, and clinical trials, in which the dose allocation usually follows the dose escalation principle to prevent dose limiting toxicities. We consider the continuum-armed bandit problem when the arm sequence is required to be monotone. We show that when the unknown objective function is Lipschitz continuous, the regret is $O(T)$. When in addition the objective function is unimodal or quasiconcave, the regret is $\tilde O(T^{3/4})$ under the proposed algorithm, which is also shown to be the optimal rate. This deviates from the optimal rate $\tilde O(T^{2/3})$ in the continuous-armed bandit literature and demonstrates the cost to the learning efficiency brought by the monotonicity requirement.

翻译：在许多在线学习或多武装土匪问题中,所采取的行动或拉动的武器是零星的,需要长期保持单一状态,例如动态定价,公司使用加价定价政策来吸引早期收养者,阻止战略等待,临床试验,剂量分配通常遵循剂量升级原则,以防止剂量限制毒性。我们认为,当手臂序列需要为单质时,连续武装土匪问题。我们表明,当未知目标功能是Lipschitz持续时,遗憾是O(T)美元。在目标功能是单式或准组合时,根据拟议的算法,遗憾是$\tilde O(T ⁇ 3/4})$,这也表明这是最佳的算法。这与连续武装土匪文献中的最佳速率$\tilde O(T ⁇ 2/3}不同,并表明单调要求对学习效率的成本。

0

相关内容

赌博机/老虎机

赌博机/老虎机

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Capacity and Optimal Resource Allocation for IRS-assisted Multi-user Communication Systems

Arxiv

0+阅读 · 2021年8月2日

Debiasing Samples from Online Learning Using Bootstrap

Arxiv

0+阅读 · 2021年7月31日

Bounds on expected propagation time of probabilistic zero forcing

Arxiv

0+阅读 · 2021年7月31日

Sequential Blocked Matching

Arxiv

0+阅读 · 2021年7月30日

Representing Pareto optima in preordered spaces: from Shannon entropy to injective monotones

Arxiv

0+阅读 · 2021年7月30日

Weak Monotone Comparative Statics

Arxiv

0+阅读 · 2021年7月30日

Online Policies for Efficient Volunteer Crowdsourcing

Arxiv

0+阅读 · 2021年7月29日

Spherical Cap Harmonic Analysis (SCHA) for Characterising the Morphology of Rough Surface Patches

Arxiv

0+阅读 · 2021年7月28日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

VIP会员

文章信息

相关主题

赌博机/老虎机

Lipschitz连续

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《战场能源实战化最佳实践：大规模作战中的发电、储能与配电体系》美陆军最新报告

《大西洋决心行动及涉乌克兰美国政府活动报告》最新120页

战术边缘计算：加速军事情报周期革命

《现代环境不确定性下的多域作战：小国防御体系构建》

相关资讯

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Capacity and Optimal Resource Allocation for IRS-assisted Multi-user Communication Systems

Arxiv

0+阅读 · 2021年8月2日

Debiasing Samples from Online Learning Using Bootstrap

Arxiv

0+阅读 · 2021年7月31日

Bounds on expected propagation time of probabilistic zero forcing

Arxiv

0+阅读 · 2021年7月31日

Sequential Blocked Matching

Arxiv

0+阅读 · 2021年7月30日

Representing Pareto optima in preordered spaces: from Shannon entropy to injective monotones

Arxiv

0+阅读 · 2021年7月30日

Weak Monotone Comparative Statics

Arxiv

0+阅读 · 2021年7月30日

Online Policies for Efficient Volunteer Crowdsourcing

Arxiv

0+阅读 · 2021年7月29日

Spherical Cap Harmonic Analysis (SCHA) for Characterising the Morphology of Rough Surface Patches

Arxiv

0+阅读 · 2021年7月28日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

微信扫码咨询专知VIP会员