配有团队比较的强盗 (Dueling Bandits with Team Comparisons) - 专知论文

会员服务 ·

0

TEAM · 赌博机/老虎机 · 可辨认的 · 学习器 · 情景 ·

2021 年 7 月 6 日

Dueling Bandits with Team Comparisons

翻译：配有团队比较的强盗

Lee Cohen,Ulrike Schmidt-Kraepelin,Yishay Mansour

We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of $k$-sized teams from a universe of $n$ players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least $1/2$). Noisy comparisons are linked to a total order on the teams. We formalize our model by building upon the dueling bandits setting (Yue et al.2012) and provide several algorithms, both for stochastic and deterministic settings. For the stochastic setting, we provide a reduction to the classical dueling bandits setting, yielding an algorithm that identifies a Condorcet winning team within $\mathcal{O}((n + k \log (k)) \frac{\max(\log\log n, \log k)}{\Delta^2})$ duels, where $\Delta$ is a gap parameter. For deterministic feedback, we additionally present a gap-independent algorithm that identifies a Condorcet winning team within $\mathcal{O}(nk\log(k)+k^5)$ duels.

翻译：我们引入了决斗团队问题, 这是一种新的在线学习环境, 学习者在其中观察到来自美元球员的球员世界范围内, 以美元大小的球员对不连配的一对美元大小的球队进行杂交比较。学习者的目标是, 以概率高的方式, 最大限度地减少确定一个康多塞特赢球队所需的决斗数量, 也就是说, 球队胜过任何其他不和球队( 概率至少为1/2美元 ) 。吵闹比较与球队的总顺序挂钩。我们通过决斗匪队的设置( Yue et al. 2012) 正式确定我们的模型, 并提供数种算法, 两者都是用于随机和确定性设置的。对于整局设置, 我们为经典决斗匪队的设置提供了减少的决斗斗, 产生一个算法, 在 $\ mathcall{O} (n k + klog ( k)\ k)\ gromaxn 中确定一个决斗队的决斗队。

0

相关内容

TEAM

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

专知会员服务

71+阅读 · 2021年1月26日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

专知会员服务

120+阅读 · 2019年10月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Arxiv

0+阅读 · 2021年9月8日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2021年9月8日

Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Arxiv

0+阅读 · 2021年9月8日

Thompson Sampling for Bandits with Clustered Arms

Arxiv

0+阅读 · 2021年9月6日

High-Dimensional Sparse Linear Bandits

Arxiv

0+阅读 · 2021年9月4日

Deterministic Distributed Vertex Coloring: Simpler, Faster, and without Network Decomposition

Arxiv

0+阅读 · 2021年9月4日

Training Agents using Upside-Down Reinforcement Learning

Arxiv

0+阅读 · 2021年9月3日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Arxiv

3+阅读 · 2018年6月13日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

专知会员服务

71+阅读 · 2021年1月26日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

专知会员服务

120+阅读 · 2019年10月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于人工智能工具改进战争场景的实时军事训练模拟器综述》最新25页

《光子技术——国防关键技术》17页报告

《设计断联：第五代与第六代战机在分散协作空中作战中的运用》美智库最新28页报告

《大数据在机器人与军事技术智能系统构建中的核心作用》

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Arxiv

0+阅读 · 2021年9月8日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2021年9月8日

Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Arxiv

0+阅读 · 2021年9月8日

Thompson Sampling for Bandits with Clustered Arms

Arxiv

0+阅读 · 2021年9月6日

High-Dimensional Sparse Linear Bandits

Arxiv

0+阅读 · 2021年9月4日

Deterministic Distributed Vertex Coloring: Simpler, Faster, and without Network Decomposition

Arxiv

0+阅读 · 2021年9月4日

Training Agents using Upside-Down Reinforcement Learning

Arxiv

0+阅读 · 2021年9月3日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Arxiv

3+阅读 · 2018年6月13日

微信扫码咨询专知VIP会员