VIP内容

主题: Locally Differentially Private (Contextual) Bandits Learning

摘要:

首先,我们提出了一种简单的黑盒归约框架,该框架可以解决带有LDP保证的大量无背景的bandits学习问题。根据我们的框架,我们可以通过单点反馈(例如 private bandits凸优化等)改善private bandits学习的最佳结果,并在LDP下获得具有多点反馈的BCO的第一结果。 LDP保证和黑盒特性使我们的框架在实际应用中比以前专门设计的和相对较弱的差分专用(DP)上下文无关强盗算法更具吸引力。此外,我们还将算法扩展到在(ε,δ)-LDP下具有遗憾约束ō(T~3/4 /ε)的广义线性bandits,这被认为是最优的。注意,给定DP上下文线性bandits的现有Ω(T)下界,我们的结果表明LDP和DP上下文bandits之间的根本区别。

成为VIP会员查看完整内容
0
11

最新论文

In this paper, we study the stochastic combinatorial multi-armed bandit problem under semi-bandit feedback. While much work has been done on algorithms that optimize the expected reward for linear as well as some general reward functions, we study a variant of the problem, where the objective is to be risk-aware. More specifically, we consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards. We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the combinatorial bandit for the two cases of Gaussian and bounded arm rewards. We further analyze these algorithms and provide regret bounds. We believe that our results provide the first theoretical insights into combinatorial semi-bandit problems in the risk-aware case.

0
0
下载
预览
参考链接
Top