通过非理性谈判公平探索 (Fair Exploration via Axiomatic Bargaining) - 专知论文

会员服务 ·

0

赌博机/老虎机 · GROUP · 上下文赌博机/上下文老虎机 · Facebook AI Research · Bandits ·

2021 年 6 月 4 日

Fair Exploration via Axiomatic Bargaining

翻译：通过非理性谈判公平探索

Jackie Baek,Vivek F. Farias

Motivated by the consideration of fairly sharing the cost of exploration between multiple groups in learning problems, we develop the Nash bargaining solution in the context of multi-armed bandits. Specifically, the 'grouped' bandit associated with any multi-armed bandit problem associates, with each time step, a single group from some finite set of groups. The utility gained by a given group under some learning policy is naturally viewed as the reduction in that group's regret relative to the regret that group would have incurred 'on its own'. We derive policies that yield the Nash bargaining solution relative to the set of incremental utilities possible under any policy. We show that on the one hand, the 'price of fairness' under such policies is limited, while on the other hand, regret optimal policies are arbitrarily unfair under generic conditions. Our theoretical development is complemented by a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.

翻译：出于考虑在学习问题中的多个群体之间公平分担勘探成本的考虑,我们在多武装强盗的背景下制定了纳什谈判方案。具体地说,与任何多武装强盗问题相关的“集团”强盗与每个时间步骤中某个有限群体中的单一群体联系在一起。一个群体在某种学习政策下获得的效用自然被视为该群体相对于该群体在“自己”情况下可能发生的遗憾而感到的遗憾的减少。我们制定的政策使得纳什谈判方案与任何政策下可能发生的一系列增量公用事业相比产生。我们表明,一方面,这种政策的“公平价格”是有限的,另一方面,在一般条件下,最优政策是任意不公平的,而另一方面,遗憾的是,最优政策是任意的。我们理论发展得到一个案例研究的补充,该研究涉及在多个种族和年龄组之间勘探成本的战争工夫的周围强盗行为。

0

相关内容

赌博机/老虎机

赌博机/老虎机

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Arxiv

0+阅读 · 2021年7月29日

Clones in Deep Learning Code: What, Where, and Why?

Arxiv

0+阅读 · 2021年7月28日

Open-Ended Learning Leads to Generally Capable Agents

Arxiv

0+阅读 · 2021年7月27日

Maximizing Marginal Fairness for Dynamic Learning to Rank

Arxiv

7+阅读 · 2021年2月18日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Hierarchically Fair Federated Learning

Arxiv

3+阅读 · 2020年5月1日

Task-Free Continual Learning

Arxiv

6+阅读 · 2018年12月10日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

赌博机/老虎机

上下文赌博机/上下文老虎机

Facebook AI Research

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

851页！《潮涨之海：代数几何的基础》新书

从二维到三维认知：通用世界模型简要综述

航天遥感大模型发展综述与产业化应用展望

WWW 2025 | 基于模式引导的多智能体协同知识抽取框架

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Arxiv

0+阅读 · 2021年7月29日

Clones in Deep Learning Code: What, Where, and Why?

Arxiv

0+阅读 · 2021年7月28日

Open-Ended Learning Leads to Generally Capable Agents

Arxiv

0+阅读 · 2021年7月27日

Maximizing Marginal Fairness for Dynamic Learning to Rank

Arxiv

7+阅读 · 2021年2月18日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Hierarchically Fair Federated Learning

Arxiv

3+阅读 · 2020年5月1日

Task-Free Continual Learning

Arxiv

6+阅读 · 2018年12月10日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员