世界状况暗含的优惠 (Preferences Implicit in the State of the World) - 专知论文

会员服务 ·

0

回合 · INFORMS · 优化器 · 奖励函数 · HTTPS ·

2019 年 4 月 18 日

Preferences Implicit in the State of the World

翻译：世界状况暗含的优惠

Rohin Shah,Dmitrii Krasheninnikov,Jordan Alexander,Pieter Abbeel,Anca Dragan

from arxiv, Published at ICLR 2019

Reinforcement learning (RL) agents optimize only the features specified in a reward function and are indifferent to anything left out inadvertently. This means that we must not only specify what to do, but also the much larger space of what not to do. It is easy to forget these preferences, since these preferences are already satisfied in our environment. This motivates our key insight: when a robot is deployed in an environment that humans act in, the state of the environment is already optimized for what humans want. We can therefore use this implicit preference information from the state to fill in the blanks. We develop an algorithm based on Maximum Causal Entropy IRL and use it to evaluate the idea in a suite of proof-of-concept environments designed to show its properties. We find that information from the initial state can be used to infer both side effects that should be avoided as well as preferences for how the environment should be organized. Our code can be found at https://github.com/HumanCompatibleAI/rlsp.

翻译：强化学习( RL) 代理器只优化奖励功能中所指定的特性, 并且对无意中遗漏的任何事物漠不关心。这意味着我们不仅必须指定要做什么, 还要指定更多的不做什么的空间。很容易忘记这些偏好, 因为这些偏好已经在我们的环境里得到满足。这促使我们关键的洞察力: 当机器人在人类行动的环境里被部署时, 环境状况已经被优化到人类想要的东西。因此, 我们可以使用国家提供的这种隐含偏好信息来填充空白。我们开发了一种基于最大 Causal Entropy IRL 的算法, 并用它来评估一套旨在显示其属性的验证概念环境中的想法。我们发现, 最初状态的信息可以用来推断两种应该避免的副作用, 以及环境组织方式的偏好。我们的代码可以在 https://github.com/ HumannicopleAI/rlsp 中找到。

0

相关内容

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

45+阅读 · 2019年12月10日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

270+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

14+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning

Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning

Arxiv

6+阅读 · 2020年3月28日

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Arxiv

4+阅读 · 2020年3月26日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

On the Implicit Assumptions of GANs

Arxiv

6+阅读 · 2018年11月29日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Arxiv

5+阅读 · 2018年7月23日

Together or Alone: The Price of Privacy in Collaborative Learning

Arxiv

4+阅读 · 2018年2月28日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

相关VIP内容

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

45+阅读 · 2019年12月10日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

270+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

14+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning

Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning

Arxiv

6+阅读 · 2020年3月28日

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Arxiv

4+阅读 · 2020年3月26日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

On the Implicit Assumptions of GANs

Arxiv

6+阅读 · 2018年11月29日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Arxiv

5+阅读 · 2018年7月23日

Together or Alone: The Price of Privacy in Collaborative Learning

Arxiv

4+阅读 · 2018年2月28日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员