BanditQ——在对抗性环境中实现用户奖励保证的无遗憾学习 (BanditQ -- No-Regret Learning with Guaranteed Per-User Rewards in Adversarial Environments) - 专知论文

会员服务 ·

0

在线预测 · 在线 · 对抗 · 预测算法 · 下界 ·

2023 年 4 月 11 日

BanditQ -- No-Regret Learning with Guaranteed Per-User Rewards in Adversarial Environments

翻译：BanditQ——在对抗性环境中实现用户奖励保证的无遗憾学习

Classic online prediction algorithms, such as Hedge, are inherently unfair by design, as they try to play the most rewarding arm as many times as possible while ignoring the sub-optimal arms to achieve sublinear regret. In this paper, we consider a fair online prediction problem in the adversarial setting with hard lower bounds on the rate of accrual of rewards for all arms. By combining elementary queueing theory with online learning, we propose a new online prediction policy, called BanditQ, that achieves the target rate constraints while achieving a regret of $O(T^{3/4})$ in the full-information setting. The design and analysis of BanditQ involve a novel use of the potential function method and are of independent interest.

翻译：经典的在线预测算法（如Hedge）由于试图尽可能多地玩最有回报的手臂而忽略了次优手臂，而在设计上是固有不公平的，以实现次线性遗憾。在本文中，我们考虑了对所有手臂的奖励率拥有严格下界的对抗性环境中的公平在线预测问题。通过将基本排队理论与在线学习相结合，我们提出了一种新的在线预测策略——BanditQ，该策略实现了目标速率约束，同时在全信息设置下实现了一个$O(T^{3/4})$的遗憾。BanditQ的设计和分析涉及了潜在函数方法的新型应用，具有独立的利息。

0

相关内容

在线预测

【AAAI2023】类增量学习的在线超参数优化

【AAAI2023】类增量学习的在线超参数优化

专知会员服务

20+阅读 · 2023年1月18日

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

专知会员服务

80+阅读 · 2020年3月4日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

观测反馈能稳的控制系统的最佳结构参数

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

柔性障碍物富集环境中的三维自主导航研究

国家自然科学基金

0+阅读 · 2013年12月31日

HER4通过调控自噬保护骨肉瘤细胞逃避凋亡的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

蜂窝网络中D2D 通信的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线无线通信系统的鲁棒性设计

国家自然科学基金

2+阅读 · 2012年12月31日

驱动器饱和受限下分布式系统的合作控制

国家自然科学基金

0+阅读 · 2012年12月31日

不同环境下的公钥加密算法设计与可证安全研究

国家自然科学基金

0+阅读 · 2012年12月31日

有色噪声下基于噪声约束最小均方估计的语音增强算法

国家自然科学基金

0+阅读 · 2011年12月31日

适应多类型Insider Attack的入侵检测与精确定位方法的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Differentially Private Federated Combinatorial Bandits with Constraints

Arxiv

1+阅读 · 2023年5月28日

Online Learning in Multi-unit Auctions

Arxiv

0+阅读 · 2023年5月27日

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Arxiv

0+阅读 · 2023年5月27日

Error Bounds for Learning with Vector-Valued Random Features

Arxiv

0+阅读 · 2023年5月26日

Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control

Arxiv

0+阅读 · 2023年5月26日

Aerial Gym -- Isaac Gym Simulator for Aerial Robots

Arxiv

0+阅读 · 2023年5月25日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年5月25日

Learning Safety Constraints from Demonstrations with Unknown Rewards

Arxiv

0+阅读 · 2023年5月25日

Near Optimal Adversarial Attack on UCB Bandits

Arxiv

0+阅读 · 2023年5月25日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2023】类增量学习的在线超参数优化

【AAAI2023】类增量学习的在线超参数优化

专知会员服务

20+阅读 · 2023年1月18日

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

专知会员服务

80+阅读 · 2020年3月4日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】利用强化学习与生成模型推进可靠且可泛化的决策

美海军研发“增强侦察与态势评估系统（ARES）”应用程序以优化作战规划（附研究论文）

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

面向深度研究系统的强化学习基础：综述

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Differentially Private Federated Combinatorial Bandits with Constraints

Arxiv

1+阅读 · 2023年5月28日

Online Learning in Multi-unit Auctions

Arxiv

0+阅读 · 2023年5月27日

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Arxiv

0+阅读 · 2023年5月27日

Error Bounds for Learning with Vector-Valued Random Features

Arxiv

0+阅读 · 2023年5月26日

Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control

Arxiv

0+阅读 · 2023年5月26日

Aerial Gym -- Isaac Gym Simulator for Aerial Robots

Arxiv

0+阅读 · 2023年5月25日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年5月25日

Learning Safety Constraints from Demonstrations with Unknown Rewards

Arxiv

0+阅读 · 2023年5月25日

Near Optimal Adversarial Attack on UCB Bandits

Arxiv

0+阅读 · 2023年5月25日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

相关基金

观测反馈能稳的控制系统的最佳结构参数

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

柔性障碍物富集环境中的三维自主导航研究

国家自然科学基金

0+阅读 · 2013年12月31日

HER4通过调控自噬保护骨肉瘤细胞逃避凋亡的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

蜂窝网络中D2D 通信的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线无线通信系统的鲁棒性设计

国家自然科学基金

2+阅读 · 2012年12月31日

驱动器饱和受限下分布式系统的合作控制

国家自然科学基金

0+阅读 · 2012年12月31日

不同环境下的公钥加密算法设计与可证安全研究

国家自然科学基金

0+阅读 · 2012年12月31日

有色噪声下基于噪声约束最小均方估计的语音增强算法

国家自然科学基金

0+阅读 · 2011年12月31日

适应多类型Insider Attack的入侵检测与精确定位方法的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员