翻译后的标题： (Doubly robust Thompson sampling for linear payoffs) - 专知论文

会员服务 ·

0

上下文 · 缺失数据 · 稳健 · 算法 · 稳健估计 ·

2023 年 4 月 30 日

Doubly robust Thompson sampling for linear payoffs

翻译：翻译后的标题：

Wonyoung Kim,Gi-soo Kim,Myunghee Cho Paik

from arxiv, Accepted for NeurIPS 2021 (Spotlight)

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. The dependence of the arm choice on the past context and reward pairs compounds the complexity of regret analysis. We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling employing the doubly-robust estimator used in missing data literature to Thompson Sampling with contexts (\texttt{LinTS}). Different from previous works relying on missing data techniques (\citet{dimakopoulou2019balanced}, \citet{kim2019doubly}), the proposed algorithm is designed to allow a novel additive regret decomposition leading to an improved regret bound with the order of $\tilde{O}(\phi^{-2}\sqrt{T})$, where $\phi^2$ is the minimum eigenvalue of the covariance matrix of contexts. This is the first regret bound of \texttt{LinTS} using $\phi^2$ without the dimension of the context, $d$. Applying the relationship between $\phi^2$ and $d$, the regret bound of the proposed algorithm is $\tilde{O}(d\sqrt{T})$ in many practical scenarios, improving the bound of \texttt{LinTS} by a factor of $\sqrt{d}$. A benefit of the proposed method is that it utilizes all the context data, chosen or not chosen, thus allowing to circumvent the technical definition of unsaturated arms used in theoretical analysis of \texttt{LinTS}. Empirical studies show the advantage of the proposed algorithm over \texttt{LinTS}.

翻译：线性回报的双重稳健 Thompson 抽样翻译后的摘要：针对赌徒问题的一个挑战性方面是，只有所选择的手臂的随机奖励得到观测，而其他手臂的奖励则保持缺失。手臂选择对于过去上下文和奖励对的依赖增加了后悔分析的复杂性。我们提出了一种新型的多臂上下文赌徒算法 Doubly Robust（DR）Thompson Sampling，采用了用于缺失数据文献中使用的双重稳健估计器，用于有上下文的 Thompson Sampling。与先前依赖于缺失数据技术的工作不同（Dimakopoulou et al.，2019；Kim et al.，2019），所提出的算法旨在允许一种新的可加性后悔分解，从而导致改进的后悔界限，其顺序为 $\tilde{O}(\phi^{-2}\sqrt{T})$，其中 $\phi^2$ 是上下文协方差矩阵的最小特征值。这是第一个使用 $\phi^2$ 而非上下文维数 $d$ 的 \texttt{LinTS} 后悔界线。应用 $\phi^2$ 和 $d$ 之间的关系，所提出的算法的后悔界线在许多实际情况下为 $\tilde{O}(d\sqrt{T})$，将 \texttt{LinTS} 的界限提高了 $\sqrt{d}$ 倍。该方法的一个好处是它利用了所有上下文数据，无论是否选择，从而允许规避理论分析中 \texttt{LinTS} 未饱和手臂的技术定义。实证研究显示了所提出算法的优势。

0

相关内容

上下文

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

129+阅读 · 2021年4月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

111+阅读 · 2020年11月12日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

模糊情况下的最优消费与投资

国家自然科学基金

3+阅读 · 2015年12月31日

朊病毒感染激活自噬的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

转基因体细胞克隆绵羊印记相关基因的DNA甲基化研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

时域连续的高维Monte Carlo绘制技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模Job shop排序问题渐近最优算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

内收缩MRCI的新方案及其程序实现

国家自然科学基金

0+阅读 · 2011年12月31日

γδ T细胞介导的抗角膜真菌病的细胞和分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Symmetry & Critical Points for Symmetric Tensor Decomposition Problems

Arxiv

0+阅读 · 2023年6月15日

Distributionally Robust Stratified Sampling for Stochastic Simulations with Multiple Uncertain Input Models

Arxiv

0+阅读 · 2023年6月15日

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Arxiv

0+阅读 · 2023年6月15日

Multi-channel Autobidding with Budget and ROI Constraints

Arxiv

0+阅读 · 2023年6月14日

Limit Theorems for Entropic Optimal Transport Maps and the Sinkhorn Divergence

Arxiv

0+阅读 · 2023年6月14日

Bandits with Replenishable Knapsacks: the Best of both Worlds

Arxiv

0+阅读 · 2023年6月14日

SWAM: Revisiting Swap and OOMK for Improving Application Responsiveness on Mobile Devices

Arxiv

0+阅读 · 2023年6月14日

Switched max-plus linear-dual inequalities: cycle time analysis and applications

Arxiv

0+阅读 · 2023年6月13日

Symmetry & Critical Points for Symmetric Tensor Decompositions Problems

Arxiv

0+阅读 · 2023年6月13日

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月12日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

129+阅读 · 2021年4月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

111+阅读 · 2020年11月12日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

相关论文

Symmetry & Critical Points for Symmetric Tensor Decomposition Problems

Arxiv

0+阅读 · 2023年6月15日

Distributionally Robust Stratified Sampling for Stochastic Simulations with Multiple Uncertain Input Models

Arxiv

0+阅读 · 2023年6月15日

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Arxiv

0+阅读 · 2023年6月15日

Multi-channel Autobidding with Budget and ROI Constraints

Arxiv

0+阅读 · 2023年6月14日

Limit Theorems for Entropic Optimal Transport Maps and the Sinkhorn Divergence

Arxiv

0+阅读 · 2023年6月14日

Bandits with Replenishable Knapsacks: the Best of both Worlds

Arxiv

0+阅读 · 2023年6月14日

SWAM: Revisiting Swap and OOMK for Improving Application Responsiveness on Mobile Devices

Arxiv

0+阅读 · 2023年6月14日

Switched max-plus linear-dual inequalities: cycle time analysis and applications

Arxiv

0+阅读 · 2023年6月13日

Symmetry & Critical Points for Symmetric Tensor Decompositions Problems

Arxiv

0+阅读 · 2023年6月13日

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月12日

相关基金

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

模糊情况下的最优消费与投资

国家自然科学基金

3+阅读 · 2015年12月31日

朊病毒感染激活自噬的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

转基因体细胞克隆绵羊印记相关基因的DNA甲基化研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

时域连续的高维Monte Carlo绘制技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模Job shop排序问题渐近最优算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

内收缩MRCI的新方案及其程序实现

国家自然科学基金

0+阅读 · 2011年12月31日

γδ T细胞介导的抗角膜真菌病的细胞和分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员