连续平均常数强盗 (Continuous Mean-Covariance Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Continuity · 相关系数 · 权值向量 · Weight ·

2021 年 2 月 24 日

Continuous Mean-Covariance Bandits

翻译：连续平均常数强盗

Yihan Du,Siwei Wang,Zhixuan Fang,Longbo Huang

Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose a novel Continuous Mean-Covariance Bandit (CMCB) model to explicitly take into account option correlation. Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance. To capture important reward observation scenarios in practice, we consider three feedback settings, i.e., full-information, semi-bandit and full-bandit feedback. We propose novel algorithms with the optimal regrets (within logarithmic factors), and provide matching lower bounds to validate their optimalities. Our experimental results also demonstrate the superiority of the proposed algorithms. To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.

翻译：现有多武装土匪风险意识模型通常侧重于不同选项(如差异)的风险评估,因此,这些模型不能直接适用于重要的真实世界在线决策问题,与相关选项相关。在本文件中,我们提出了一个新的“连续平均合作盗匪(CMCB)”模型,以明确考虑到选项的相互关系。具体地说,在CMCB中,有一个学习者依次选择特定选项的权重矢量,并根据相关决定观察随机反馈。代理人的目标是实现奖赏与风险之间的最佳取舍,并以选项变量变量衡量。为了在实际中捕捉重要的奖赏观测情景,我们考虑三种反馈环境,即完整信息、半腰带和全腰带反馈。我们提出了带有最佳遗憾的新算法(在逻辑因素内),并提供匹配较低界限以验证其最佳性。我们的实验结果还显示了拟议算法的优越性。据我们所知,这是考虑风险觉察盗匪中选项相关性并明确量化任意共性结构如何影响学习绩效的首项工作。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

15+阅读 · 2020年4月28日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

99+阅读 · 2020年2月8日

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

专知会员服务

18+阅读 · 2020年1月7日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

36+阅读 · 2019年12月27日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

Arxiv

0+阅读 · 2021年4月19日

Low-rank State-action Value-function Approximation

Arxiv

0+阅读 · 2021年4月18日

Estimating High-dimensional Covariance and Precision Matrices under General Missing Dependence

Arxiv

0+阅读 · 2021年4月17日

Newton Optimization on Helmholtz Decomposition for Continuous Games

Arxiv

0+阅读 · 2021年4月16日

Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

Arxiv

0+阅读 · 2021年4月14日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Reinforcement Learning with Perturbed Rewards

Arxiv

3+阅读 · 2018年10月5日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

15+阅读 · 2020年4月28日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

99+阅读 · 2020年2月8日

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

专知会员服务

18+阅读 · 2020年1月7日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

36+阅读 · 2019年12月27日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

Arxiv

0+阅读 · 2021年4月19日

Low-rank State-action Value-function Approximation

Arxiv

0+阅读 · 2021年4月18日

Estimating High-dimensional Covariance and Precision Matrices under General Missing Dependence

Arxiv

0+阅读 · 2021年4月17日

Newton Optimization on Helmholtz Decomposition for Continuous Games

Arxiv

0+阅读 · 2021年4月16日

Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

Arxiv

0+阅读 · 2021年4月14日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Reinforcement Learning with Perturbed Rewards

Arxiv

3+阅读 · 2018年10月5日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员