非稳定配给强盗的最佳和高效动态抑制力比值 (Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 优化器 · Pair · Extensibility · Agent ·

2022 年 6 月 12 日

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

翻译：非稳定配给强盗的最佳和高效动态抑制力比值

Aadirupa Saha,Shubham Gupta

from arxiv, Accepted to International Conference on Machine Learning (ICML), 2022 [both authors contributed equally]

We study the problem of \emph{dynamic regret minimization} in $K$-armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary `win-loss' feedback for this pair, sampled from an underlying preference matrix at that round. We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O(\sqrt{KT})$ high probability regret. We next use similar algorithmic ideas to propose an efficient and provably optimal algorithm for dynamic-regret minimization under two notions of non-stationarities. In particular, we establish $\tO(\sqrt{SKT})$ and $\tO({V_T^{1/3}K^{1/3}T^{2/3}})$ dynamic-regret guarantees, $S$ being the total number of `effective-switches' in the underlying preference relations and $V_T$ being a measure of `continuous-variation' non-stationarity. The complexity of these problems have not been studied prior to this work despite the practicability of non-stationary environments in real world systems. We justify the optimality of our algorithms by proving matching lower bound guarantees under both the above-mentioned notions of non-stationarities. Finally, we corroborate our results with extensive simulations and compare the efficacy of our algorithms over state-of-the-art baselines.

翻译：我们用非固定或时间偏好来研究以K$为单位的斗牛贼在非固定或时间差异的偏好下如何最小化的问题。这是一个在线学习设置, 代理商在每轮中选择一对物品, 并只观察对这对物品的相对二进制“ 双败” 反馈, 从该回合的基本优惠矩阵中取样。我们首先研究对冲优惠序列的静态最小化问题, 并设计一个具有以美元( sqrt{KT}) 为单位的高效算法。我们接下来使用类似的算法想法来提议一个高效和可变的最佳算法, 在两种非静止概念下, 以动态最小化为单位的最小化, 特别是我们建立美元( sqrt{SKT}) 和 $tO (V_T ⁇ 1/3}K}K}1/3}T ⁇ regret $, 动态- regret 保证, 美元是基本优惠关系中“ 有效转换” 和 $_T$_trest 最佳算法的算算法, 在不固定的逻辑上, 我们的不固定的逻辑环境下, 在不固定的逻辑上, 在不固定的逻辑上, 在不固定的逻辑上, 我们的逻辑上, 在不固定的逻辑上, 在不固定的逻辑上,我们不固定的逻辑上, 的逻辑上, 的逻辑上, 在不反复的逻辑上,我们不反复的逻辑上, 我们的逻辑上, 和不反复的逻辑的逻辑上, 的逻辑上, 。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

69+阅读 · 2022年7月11日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

75+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

132+阅读 · 2021年6月16日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

81+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

25+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

AG-WUS-PcG-lncRNA互作对梅多雌蕊发育的调控

国家自然科学基金

0+阅读 · 2015年12月31日

MDC1调控雄激素受体介导的基因转录及其在前列腺癌发生发展中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

前列腺癌骨转移低表达miRNA调控KLF17促进骨转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

南海台风诱发的近惯性振荡的演化特征

国家自然科学基金

0+阅读 · 2012年12月31日

基于pH刺激的浸矿微生物协同作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

Hsa-mir-126调控PKCdelta/ERK信号通路及其在系统性红斑狼疮发病机理中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

Combinatorial Causal Bandits

Arxiv

0+阅读 · 2022年8月3日

Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics

Arxiv

0+阅读 · 2022年8月2日

Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures

Arxiv

0+阅读 · 2022年8月1日

The Effect of Omitted Variables on the Sign of Regression Coefficients

Arxiv

0+阅读 · 2022年8月1日

Streaming Algorithms for Diversity Maximization with Fairness Constraints

Arxiv

0+阅读 · 2022年7月30日

Polynomial-Time Power-Sum Decomposition of Polynomials

Arxiv

0+阅读 · 2022年7月30日

Best-of-Both-Worlds Algorithms for Partial Monitoring

Arxiv

0+阅读 · 2022年7月29日

Optimistic and Topological Value Iteration for Simple Stochastic Games

Arxiv

0+阅读 · 2022年7月29日

Semi-supervised Learning of Partial Differential Operators and Dynamical Flows

Arxiv

0+阅读 · 2022年7月28日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

69+阅读 · 2022年7月11日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

59+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

75+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

132+阅读 · 2021年6月16日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

81+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

25+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

相关论文

Combinatorial Causal Bandits

Arxiv

0+阅读 · 2022年8月3日

Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics

Arxiv

0+阅读 · 2022年8月2日

Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures

Arxiv

0+阅读 · 2022年8月1日

The Effect of Omitted Variables on the Sign of Regression Coefficients

Arxiv

0+阅读 · 2022年8月1日

Streaming Algorithms for Diversity Maximization with Fairness Constraints

Arxiv

0+阅读 · 2022年7月30日

Polynomial-Time Power-Sum Decomposition of Polynomials

Arxiv

0+阅读 · 2022年7月30日

Best-of-Both-Worlds Algorithms for Partial Monitoring

Arxiv

0+阅读 · 2022年7月29日

Optimistic and Topological Value Iteration for Simple Stochastic Games

Arxiv

0+阅读 · 2022年7月29日

Semi-supervised Learning of Partial Differential Operators and Dynamical Flows

Arxiv

0+阅读 · 2022年7月28日

相关基金

AG-WUS-PcG-lncRNA互作对梅多雌蕊发育的调控

国家自然科学基金

0+阅读 · 2015年12月31日

MDC1调控雄激素受体介导的基因转录及其在前列腺癌发生发展中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

前列腺癌骨转移低表达miRNA调控KLF17促进骨转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

南海台风诱发的近惯性振荡的演化特征

国家自然科学基金

0+阅读 · 2012年12月31日

基于pH刺激的浸矿微生物协同作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

Hsa-mir-126调控PKCdelta/ERK信号通路及其在系统性红斑狼疮发病机理中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员