通过高山进程进行的上下文组合起伏强盗 (Contextual Combinatorial Volatile Bandits via Gaussian Processes) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 基 · ARM · 情景 · Processing（编程语言） ·

2021 年 10 月 5 日

Contextual Combinatorial Volatile Bandits via Gaussian Processes

翻译：通过高山进程进行的上下文组合起伏强盗

Andi Nika,Sepehr Elahi,Cem Tekin

from arxiv, 33 pages, 7 figures

We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability. At the beginning of each round, the agent observes the set of available base arms and their contexts and then selects an action that is a feasible subset of the set of available base arms to maximize its cumulative reward in the long run. We assume that the mean outcomes of base arms are samples from a Gaussian Process indexed by the context set ${\cal X}$, and the expected reward is Lipschitz continuous in expected base arm outcomes. For this setup, we propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB) and prove that it incurs $\tilde{O}(K\sqrt{T\overline{\gamma}_{T}} )$ regret with high probability, where $\overline{\gamma}_{T}$ is the maximum information gain associated with the set of base arm contexts that appeared in the first $T$ rounds and $K$ is the maximum cardinality of any feasible action over all rounds. To dramatically speed up the algorithm, we also propose a variant of O'CLOK-UCB that uses sparse GPs. Finally, we experimentally show that both algorithms exploit inter-base arm outcome correlation and vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.

翻译：在每回合开始时,代理商观察一套可用的基础武器及其背景,然后选择一套可行的基础武器子集,以便长期最大限度地增加其累积报酬。我们假设,基础武器的平均结果是从一个高斯进程的样本中得出,其上下文是设定的组合动作和时间变化基础武器供应情况。我们假设,基础武器的平均结果是根据设定的_美元(cal X})的上下文索引,而预期的奖励是利普施茨在预期基础武器成果中的持续使用。对于这一设置,我们建议使用一种名为“最佳组合学习”和“最佳组合学习”的算法,在Kernel Up Inflicity Bounds(O'CLOK-UCB)中,然后选择一种可行的一组行动,作为长期最大的一部分,证明它产生$(tilde{O}(K\qrt{T\overline_gama_}}}美元(obrbr)的遗憾,其中$(overline_gama_T}$(美元)是基础武器背景环境背景环境中出现的最大信息增益。对于头一轮的计算结果,我们也提出一个巨大的实验性分析。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【ICML2021】粒子流RNN的概率时空预测

专知会员服务

21+阅读 · 2021年8月31日

【ICML2021】低秩Sinkhorn 分解

专知会员服务

39+阅读 · 2021年8月20日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【快讯】NeurIPS2020结果出炉，1900篇上榜，你的paper中了吗？

【快讯】NeurIPS2020结果出炉，1900篇上榜，你的paper中了吗？

专知会员服务

54+阅读 · 2020年9月26日

【WWW2020-中科大-滴滴】层次自适应上下文赌博机的资源约束推荐

【WWW2020-中科大-滴滴】层次自适应上下文赌博机的资源约束推荐

专知会员服务

21+阅读 · 2020年4月3日

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

专知会员服务

118+阅读 · 2019年10月25日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【ICML2021】低秩Sinkhorn 分解

【ICML2021】低秩Sinkhorn 分解

专知

9+阅读 · 2021年8月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

一文读懂 Netflix 的推荐探索策略 Contextual Bandits

一文读懂 Netflix 的推荐探索策略 Contextual Bandits

人工智能头条

3+阅读 · 2018年1月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

已删除

将门创投

4+阅读 · 2017年7月7日

Contraction $\mathcal{L}_1$-Adaptive Control using Gaussian Processes

Arxiv

0+阅读 · 2021年11月30日

Prediction with Approximated Gaussian Process Dynamical Models

Arxiv

0+阅读 · 2021年11月30日

Induced betweenness in order-theoretic trees

Arxiv

0+阅读 · 2021年11月30日

Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes

Arxiv

0+阅读 · 2021年11月29日

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Arxiv

0+阅读 · 2021年11月29日

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Arxiv

0+阅读 · 2021年11月27日

Bandit problems with fidelity rewards

Arxiv

0+阅读 · 2021年11月25日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

VIP会员

文章信息

相关主题

赌博机/老虎机

Processing（编程语言）

相关VIP内容

【ICML2021】粒子流RNN的概率时空预测

专知会员服务

21+阅读 · 2021年8月31日

【ICML2021】低秩Sinkhorn 分解

专知会员服务

39+阅读 · 2021年8月20日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【快讯】NeurIPS2020结果出炉，1900篇上榜，你的paper中了吗？

【快讯】NeurIPS2020结果出炉，1900篇上榜，你的paper中了吗？

专知会员服务

54+阅读 · 2020年9月26日

【WWW2020-中科大-滴滴】层次自适应上下文赌博机的资源约束推荐

【WWW2020-中科大-滴滴】层次自适应上下文赌博机的资源约束推荐

专知会员服务

21+阅读 · 2020年4月3日

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

专知会员服务

118+阅读 · 2019年10月25日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】利用强化学习与生成模型推进可靠且可泛化的决策

美海军研发“增强侦察与态势评估系统（ARES）”应用程序以优化作战规划（附研究论文）

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

面向深度研究系统的强化学习基础：综述

相关资讯

【ICML2021】低秩Sinkhorn 分解

【ICML2021】低秩Sinkhorn 分解

专知

9+阅读 · 2021年8月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

一文读懂 Netflix 的推荐探索策略 Contextual Bandits

一文读懂 Netflix 的推荐探索策略 Contextual Bandits

人工智能头条

3+阅读 · 2018年1月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

已删除

将门创投

4+阅读 · 2017年7月7日

相关论文

Contraction $\mathcal{L}_1$-Adaptive Control using Gaussian Processes

Arxiv

0+阅读 · 2021年11月30日

Prediction with Approximated Gaussian Process Dynamical Models

Arxiv

0+阅读 · 2021年11月30日

Induced betweenness in order-theoretic trees

Arxiv

0+阅读 · 2021年11月30日

Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes

Arxiv

0+阅读 · 2021年11月29日

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Arxiv

0+阅读 · 2021年11月29日

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Arxiv

0+阅读 · 2021年11月27日

Bandit problems with fidelity rewards

Arxiv

0+阅读 · 2021年11月25日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

微信扫码咨询专知VIP会员